ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
All valid points, thanks for raising! - I've merged your suggestions :)
If you should find anything else unclear or have suggestions for improvement, please let me know.
When I ran the quick start in
README
, I found that I needed to modify the code to make it runnable.T5Tokenizer
does not have theto
methodsequence_examples
,embedding_repr
)Also,
PretrainedTokenizer.batch_encode_plus
is obsolete, at least from Jul. 2020: Document, PRConsidering the above, I think the following changes to the quick start are preferable: https://github.com/delta2323/ProtTrans/commit/afa87dcea4fe59873f69945d0ec1a72f401ac8cd
I appreciate the team's work. I hope this comment may help to improve the package.