Closed ligeng-k closed 2 months ago
Do you have any idea about how to train this model on my own dataset? I didn't find any script for training
I'd also be keen on doing the same: adding my own protein sequences to the model, then running the new model for predicting new sequences.
On Thu, Aug 22, 2024 at 8:17 AM SanFran-Me @.***> wrote:
Do you have any idea about how to train this model on my own dataset? I didn't find any script for training
— Reply to this email directly, view it on GitHub https://github.com/brianhie/efficient-evolution/issues/38#issuecomment-2303954425, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGSNYP6HMXICJUO2MKKM3ZSWGCBAVCNFSM6AAAAABM5QEPAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBTHE2TINBSGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
reconstruct_multi_models()
runs the wildtype sequence through an ESM model, selects the amino acid at each position with the maximum likelihood, and then sees where those mutations differ from the wildtype sequence
We use pretrained ESM models to suggest mutations, and an important takeaway from our paper is that general protein language models may in many cases work better than specialized protein language models, a notable example being for antibody evolution.
There are a number of resources describing different ways to finetune ESM https://github.com/facebookresearch/esm/discussions/33 https://huggingface.co/blog/AmelieSchreiber/esm-interact https://aws.amazon.com/blogs/machine-learning/efficiently-fine-tune-the-esm-2-protein-language-model-with-amazon-sagemaker/
Hi,Developer!
How to understand the function reconstruct_multi_models() and what it does to the model.
Best, Jamie
Note:
amis.py