Improvements on Quick Start in README

agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.

Academic Free License v3.0

1.05k stars 150 forks source link

When I ran the quick start in README, I found that I needed to modify the code to make it runnable.

T5Tokenizer does not have the to method
're' package should be installed
Fix several typos (sequence_examples, embedding_repr)

Also, PretrainedTokenizer.batch_encode_plus is obsolete, at least from Jul. 2020: Document, PR

Considering the above, I think the following changes to the quick start are preferable: https://github.com/delta2323/ProtTrans/commit/afa87dcea4fe59873f69945d0ec1a72f401ac8cd

I appreciate the team's work. I hope this comment may help to improve the package.

agemagician / ProtTrans

Improvements on Quick Start in README #127