Closed dacorvo closed 4 months ago
Thank you for reporting. We are looking into it, will get back on this soon
We identified the issue, and we would have a fix in the upcoming release.
@dacorvo I believe this is fixed with the latest 2.16. Please try and let us know if you are still facing any issues.
Actually, I don't use it: it is just that going through the code I noticed a difference with what is done in transformers
.
Closing the issue as its suppose to be fixed with 2.16 release. Please re-open if you find another issue.
Looking at the code of the
top_k_top_p_filtering
method insampling.py
, I am wondering if the algorithm for applying the top-p filtering is correct.Unlike the
transformers
implementation, the algorithm performs a cumulative sum on logits probabilities sorted in descending order, which seems to lead to a different selection.Example:
transformers
algorithmtransformers-neuronx
algorithmI checked the result by crafting a sample: