JSchlensok / VespaG

Expert-Guided Protein Language Models enable Accurate and Blazingly Fast Fitness Prediction
GNU General Public License v3.0
9 stars 3 forks source link

fine-tuning with more protein sequences #7

Open avilella opened 3 months ago

avilella commented 3 months ago

Hi, I have a corpus of about 500,000 protein sequences and would like to apply them to existing models like ESM2 or this one for predicting the fitness effect of changing an amino-acid for another. How could I add my sequences to the models referred in this repo to then use the modified model for such task? Thanks.

JSchlensok commented 2 months ago

Hi, thanks for opening this issue and sorry for responding so late.

Can you elaborate what you mean by applying your corpus of sequences to an existing model? Do you wanna train or refinement-learn a pLM-based predictor like VespaG on the sequences? Do you want to fine-tune a pLM like ESM2?

Cheers