Aleph-Alpha / magma

MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multilingual models from Aleph Alpha check out our website https://app.aleph-alpha.com
MIT License
469 stars 55 forks source link

top_p argument is used like 1-top_p #29

Closed nostalgebraist closed 1 year ago

nostalgebraist commented 2 years ago

For example, top_p=0.999 gives you nearly deterministic sampling, not nearly on-distribution sampling.


I was confused why I was getting much less diverse samples with top_p=0.95 than I got with top_p turned off.

I found the cause in these lines:

https://github.com/Aleph-Alpha/magma/blob/bfd5c8def6a290f98b7eae34da120756f708cd38/magma/sampling.py#L11-L14

threshold is set to top_p here:

https://github.com/Aleph-Alpha/magma/blob/bfd5c8def6a290f98b7eae34da120756f708cd38/magma/sampling.py#L101-L102

Suppose eg threshold is 0.95. Then 1-threshold is 0.05.

So we remove all tokens where the cumulative probs are > 0.05, which is most of the tokens -- we are really doing top-p sampling with top_p=0.05 (in the usual convention), not the intended top_p=0.95.

CoEich commented 1 year ago

Thanks for catching that, it is fixed now.