agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
Academic Free License v3.0
1.13k stars 153 forks source link

Details on Rostlab/prot_t5_base_mt_uniref50 #67

Closed ArcaneEmergence closed 2 years ago

ArcaneEmergence commented 2 years ago

Hello,

Thank you for your excellent work, the pretrained feature extractors will help to accelerate ML on protein use cases a lot.

I have stumbled upon the "Rostlab/prot_t5_base_mt_uniref50" model. Could you provide details on it, e.g. on what dataset and task it is trained? Based on it's name, it seems to be pretrained on UniRef50, and then fine-tuned on mt=Multi-Task?

Do you think this model is suitable as a general feature extractor / for fine-tuning?

Thank you in advance for your answer.

mheinzinger commented 2 years ago

Thanks for your kind words and your interest in our work! :) Regarding "Rostlab/prot_t5_base_mt_uniref50": good catch. Yes, this is a fine-tuned version of our ProtT5-XL-U50 model but this was highly experimental and did not work as expected. Therefor, I would generally recommend to stick to ProtT5-XL-U50 for any sort of general feature extraction: https://huggingface.co/Rostlab/prot_t5_xl_uniref50