agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
Academic Free License v3.0
1.13k stars 153 forks source link

Checkpoints of pre-trained models #64

Closed pzhang84 closed 2 years ago

pzhang84 commented 3 years ago

The performance of protBert is impressive! I'd love to fine-tune the pre-trained bert model on my own protein sequence to see what happens. Could you point me to where can I find the checkpoints for the pre-trained models? It seems the pre-trained model on huggingface doesn't include checkpoint files -- https://huggingface.co/Rostlab/prot_bert/tree/main

Thanks in advance for your help!

mheinzinger commented 2 years ago

You are right, we focused on the PyTorch checkpoints that are available via Huggingface. I found the TF-checkpoints for ProtBERT-UniRef100 (did not find the ProtBERT-BFD checkpoints; sorry) and put them for download here: https://rostlab.org/~deepppi/protbert_u100.tar.gz

mheinzinger commented 2 years ago

Short addendum: I've recovered the ProtBERT-BFD TF-checkpoint and put it here for download: https://rostlab.org/~deepppi/protbert_bfd_tf.tar.gz Given that ProtBERT-BFD performed better than ProtBERT-UniRef100, I would recommend to use ProtBERT-BFD for fine-tuning.