Closed wuzhen247 closed 4 years ago
Hi @wuzhen247, thanks for your interest! In the examples
folder, we provide a tutorial for training a downstream model to predict the fitness of protein variants.
At this time, we do not provide model pre-training in ESM. Internally, we trained these models using the fairseq toolkit. We highly recommend using fairseq for pre-training new models.
Hope that helps! Feel free to reopen the issue if you have any more questions.
Hi @wuzhen247, thanks for your interest! In the
examples
folder, we provide a tutorial for training a downstream model to predict the fitness of protein variants.At this time, we do not provide model pre-training in ESM. Internally, we trained these models using the fairseq toolkit. We highly recommend using fairseq for pre-training new models.
Hope that helps! Feel free to reopen the issue if you have any more questions.
Thanks for your suggestions and fast response. I will try them.
Hi; Thanks so much for your great work; it is so useful I have a question regarding the above issue: In the example you mentioned: "Our embeddings are stored with the file name from fasta header: {index}|{mutation_id}|{effect}.pt" So, how you did that? I mean how you convert the seq in fasta file to emebedded file ( i mean .pt) ? Thanks again
Nasser
Hi @nasserhashemi, this is described under "prerequisites" in the example notebook. Pasting below for your convenience.
You have obtained sequence embeddings for ß-lactamase as described in the README, either by: running
python extract.py esm1_t34_670M_UR50S examples/P62593.fasta examples/P62593_reprs/ --repr_layers 34 --include mean
OR for your convenience we precomputed the embeddings and you can download them from here - see below to download this right here from in this notebook
Oh, I see, great, Thanks so much for your prompt reply;
Great work! I find no model pre-training and downstream task fine-tuning scripts in the repository. Could you provide them?
Thanks.