Closed TheBil99 closed 1 year ago
We only fine-tuned for specific tasks so far which gave us mixed results. One has the be careful about hyperparameters, especially, if you have little data. It is usually crucial to have a proper train, test, val split to spot the point where the model just overfits. But no: we did not fine-tune it so far on a specific family. But this is definitely a very interesting direction to explore.
Many thanks for the answer and for the suggestions! Also, I wanted to ask you, for a similar task, would you advice following the procedure described in https://github.com/google-research/text-to-text-transfer-transformer#using-a-tsv-file-directly, under the section fine-tuning? Or do you already have dedicated script to do something similar?
Isn't the fine-tuning I aim to perform similar to what you did with ProtT5-XL-U50, which was first trained on BFD and then fine-tuned on Uniref50?
No, I have no dedicated script for this yet. I am currently working on finetuning as well (though not on families but completely new tasks) and I found this script useful (makes your life much easier as you can build on top of existing huggingface implementatoins): https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py
Hi, I am interested in extracting representations for a given protein family for further analyses. I was wondering, would it be possible to fine-tune the ProtT5-XL-U50 model on this family, to increase model's capability to represent proteins belonging to this family?
Is this something you already explored? Thank you in advance!!