OATML-Markslab / Tranception

Official repository for the paper "Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval"
MIT License
138 stars 25 forks source link

how to calculate the fitness of a protein equivalent to ESM log_likelihood score script? #3

Closed avilella closed 2 years ago

avilella commented 2 years ago

Dears,

I see that this repo scores highly compared to Facebook's ESM log_likelihood score for protein stability/fitness predictions.

Facebook's ESM repo has a python scrip to calculate the log_likelihood from the 1D sequence input files.

Is there such equivalent script in this repo? If so, how to I call it? I've read the README.md but didn't find anything immediately equivalent.

Thanks in advance,

pascalnotin commented 2 years ago

Hi Albert,

The scoring script allows to compute the fitness of mutated sequences in the ProteinGym assays. See lines 19-44 for a description of the different argsparse arguments, and this example bash script. Please let us know if that does not quite cover the use case you had in mind and we will modify the scoring script / add a new script as needed.

Kind regards, Pascal

pascalnotin commented 2 years ago

Hi Albert,

I added the details from my prior message to the README. In particular, the scoring script helps compute the log likelihood ratios for mutated sequences Vs the wild type sequence for the corresponding protein family. I will be closing the issue but feel free to re-open if there are other functionalities that would be helpful to your work.

Kind regards, Pascal