agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
Academic Free License v3.0
1.13k stars 153 forks source link

What is the `./protT5` directory in the Colab example? #150

Closed ekiefl closed 5 months ago

ekiefl commented 7 months ago

I booted up the colab example, but am running into errors due to my colab session not having the directory ./protT5. What is this directory?

mheinzinger commented 7 months ago

Could you provide a link to the specific notebook you are referring to? From the name, I assume that in this directory, the protein language model ProtT5 should live. You can download it here: https://huggingface.co/Rostlab/prot_t5_xl_uniref50

ekiefl commented 7 months ago

Yeah sorry for the lack of specificity.

I'm referring to the colab in this table:

image

Link here: https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing

Thanks for your time.

mheinzinger commented 6 months ago

Ah, yeah, now I see what you are saying. You need to manually create this directory and place the protein sequences you want to process in there.

ekiefl commented 5 months ago

Thanks for the response. I've revisited the Colab and things are running from start to finish. For whatever reason, I had originally overlooked Cell 2 and never ran it:

#@title Set up working directories and download files/checkpoints. { display-mode: "form" }
# Create directory for storing model weights (2.3GB) and example sequences.
# Here we use the encoder-part of ProtT5-XL-U50 in half-precision (fp16) as 
# it performed best in our benchmarks (also outperforming ProtBERT-BFD).
# Also download secondary structure prediction checkpoint to show annotation extraction from embeddings
!mkdir protT5 # root directory for storing checkpoints, results etc
!mkdir protT5/protT5_checkpoint # directory holding the ProtT5 checkpoint
!mkdir protT5/sec_struct_checkpoint # directory storing the supervised classifier's checkpoint
!mkdir protT5/output # directory for storing your embeddings & predictions
!wget -nc -P protT5/ https://rostlab.org/~deepppi/example_seqs.fasta
# Huge kudos to the bio_embeddings team here! We will integrate the new encoder, half-prec ProtT5 checkpoint soon
!wget -nc -P protT5/sec_struct_checkpoint http://data.bioembeddings.com/public/embeddings/feature_models/t5/secstruct_checkpoint.pt

I'm going to close this issue. Thanks for your help.