Closed ekiefl closed 5 months ago
Could you provide a link to the specific notebook you are referring to? From the name, I assume that in this directory, the protein language model ProtT5 should live. You can download it here: https://huggingface.co/Rostlab/prot_t5_xl_uniref50
Yeah sorry for the lack of specificity.
I'm referring to the colab in this table:
Link here: https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing
Thanks for your time.
Ah, yeah, now I see what you are saying. You need to manually create this directory and place the protein sequences you want to process in there.
Thanks for the response. I've revisited the Colab and things are running from start to finish. For whatever reason, I had originally overlooked Cell 2 and never ran it:
#@title Set up working directories and download files/checkpoints. { display-mode: "form" }
# Create directory for storing model weights (2.3GB) and example sequences.
# Here we use the encoder-part of ProtT5-XL-U50 in half-precision (fp16) as
# it performed best in our benchmarks (also outperforming ProtBERT-BFD).
# Also download secondary structure prediction checkpoint to show annotation extraction from embeddings
!mkdir protT5 # root directory for storing checkpoints, results etc
!mkdir protT5/protT5_checkpoint # directory holding the ProtT5 checkpoint
!mkdir protT5/sec_struct_checkpoint # directory storing the supervised classifier's checkpoint
!mkdir protT5/output # directory for storing your embeddings & predictions
!wget -nc -P protT5/ https://rostlab.org/~deepppi/example_seqs.fasta
# Huge kudos to the bio_embeddings team here! We will integrate the new encoder, half-prec ProtT5 checkpoint soon
!wget -nc -P protT5/sec_struct_checkpoint http://data.bioembeddings.com/public/embeddings/feature_models/t5/secstruct_checkpoint.pt
I'm going to close this issue. Thanks for your help.
I booted up the colab example, but am running into errors due to my colab session not having the directory
./protT5
. What is this directory?