DeepGraphLearning / ProtST

[ICML-23 ORAL] ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts
Apache License 2.0
87 stars 7 forks source link

How to get pre-training data? #3

Closed Greay83 closed 1 year ago

Greay83 commented 1 year ago

Hi, this is a great work. I notice that this repositories does not contain the corpus used in pre-training, called " ProtDescribe". Can you provide it? Thanks so much.

KatarinaYuan commented 1 year ago

Hi, thank you for your interest in our work! ProtDescribe is implemented as class UniProtSeqText in file ./protst/dataset.py. I also have updated the url link to download the TrEmbl version of ProtDescribe. Please let us know if you have further questions.

Greay83 commented 1 year ago

No problem for now. Thanks so much again.