This PR primarily implements alternative methods such as Top-P-X and Top-A from RWKV. I'll evaluate whether they perform better than the baseline, and if they do, merge this PR or perhaps add even more options to the inference code. Once we're confident that our sampling performs well, we can take a stab at alternative data distributions (#5, #9) and automated evaluation (#21).
This PR primarily implements alternative methods such as Top-P-X and Top-A from RWKV. I'll evaluate whether they perform better than the baseline, and if they do, merge this PR or perhaps add even more options to the inference code. Once we're confident that our sampling performs well, we can take a stab at alternative data distributions (#5, #9) and automated evaluation (#21).