facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

native log likelihood changes at each execution #226

Closed avilella closed 2 years ago

avilella commented 2 years ago

Bug description I am running the very useful esm/examples/inverse_folding/score_log_likelihoods.py script to calculate the likelihood of a chain with a pdb structure where one of the two chains is identical and the query chain has 1 aminoacid difference to the chain in the pdb. The reported "native likelihood" is different every time I run it against the same pdb, which makes me wonder if I am misinterpreting what "native likelihood" means in this context, or if this is a bug.

Reproduction steps Input a fasta sequence of one of the two chains present in the pdb (obtained via Alphafold2), where the input chain is identical to one of the two chains except for 1 aminoacid. Take note of the "native likelihood". Try again with a different query chain, this time a different aminoacid change to the previous one. The "native likelihood" is different than the one reported the first time.

Expected behavior If I am interpreting "native likelihood" correctly, I was expecting this value to be the same every time I run the script with the same pdb structure.

Logs Please paste the command line output:

I did this in a sweep of a chain, where I change the original aminoacid for something different, and then compared "native_llh" values to the log_likelihood of the execution of the variant, and coloured here by the llh_delta between the two. I just see a cloud of values, so these two seem relatively uncorrelated to each other. This is an observation, I am not sure what I was expecting.

![image](https://user-images.githubusercontent.com/158007/176405532-640b86db-a87a-49f7-9c21-26a0cd100dd8.png)

Additional context Add any other context about the problem here. (like proxy settings, network setup, overall goals, etc.)

tomsercu commented 2 years ago

Did you check you're on latest ie repo HEAD state? The model needs to be in eval mode, ie this line has to be there

naailkhan28 commented 2 years ago

Why would you expect the log likelihood to be the same each time? You're scoring different sequences against the same backbone each time right?

avilella commented 2 years ago

Ok, I checked out the HEAD version as of a few minutes ago, and it's now working as I was expecting, i.e. if you run the score_log_likelihood.py script with a pdb and a chain with the exact same sequence as in the pdb, the native llh value, with 2 decimal numbers is the same as the log_likelihood value returned by the script.

Thanks!

On Wed, Jul 20, 2022 at 10:50 PM naailkhan28 @.***> wrote:

Why would you expect the log likelihood to be the same each time? You're scoring different sequences against the same backbone each time right?

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/esm/issues/226#issuecomment-1190796976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGSN64F5NEFMFJNN3P663VVBYCJANCNFSM52FCMKKA . You are receiving this because you authored the thread.Message ID: @.***>