Open avilella opened 3 months ago
Hi @avilella -- do you have property annotations for these 500k sequences? Or just the amino acid sequences w/ no annotation? ProteinNPT is first and foremost a model that learns a joint distribution of sequences and corresponding labels, so it is not the most adapted to your setting if there is no such label/annotation. If no label, you may be interested in the various zero-shot baselines we have integrated in the ProteinGym benchmark. Best, Pascal
Hi, I have a corpus of about 500,000 protein sequences and would like to apply them to existing models like ESM2 or this one for predicting the fitness effect of changing an amino-acid for another. How could I add my sequences to the models referred in this repo to then use the modified model for such task? Thanks.