Closed Xinxinatg closed 2 years ago
Hey, thanks a lot for your interest in our work. So within PredictProtein we've included secondary structure prediction from ProtT5. For backward compatibility, we've still kept the previous secondary structure predictor built in our lab (RePROF). From our benchmarks, sec. struct. prediction from ProtT5 outperforms RePROF. If you need to predict secondary structure for many sequences, I would recommend to use our bio_embeddings repo which allows you to make predictions based on ProtTrans models. Here are some notebook examples, as well as, pipeline examples.
@mheinzinger Thanks for your reply! I just checked the pipepline examples, I found that in this page, only seq2vec and bert models are provided. I am wondering whether the bert means ProtBert here and the best performing model ProtT5 trained on secondary structure/disorder dataset have been released? I also checked in the hugging face page for Rostlab, it seems like only the ProtBert model for secondary structure prediction has been released.
Good spot. Indeed this is the right direction. The only thing that you need to adjust now is the model that is loaded by the yaml file. In your case probably "prottrans_t5_uniref50". Here is an overview of the available models: bio_embeddings parameters. The models in the huggingface Rostlab repo you are referring to are fine-tuned on supervised tasks which did yield improvement in our hands. So I still recommend to use secondary structure from ProtT5.
Thanks for this pioneering work which have benefited a lot of researchers including myself! While I am tyring to use pre-trained model to predict secondary structure, the github page indicated that user shall head to predictprotein.org to input sequence. Then I found the model built within predictprotein.org is RePROF. I am wondering whether RePROF has been proved to outperform the model based on the features extracted from ProtTrans. If not how I can access the model built with features generated by ProtTrans? Looking forward to your reply.