agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
Academic Free License v3.0
1.13k stars 153 forks source link

ZeroDivisionError when running prott5_embedder.py #110

Closed wojciech-galan closed 1 year ago

wojciech-galan commented 1 year ago

I tried to extract emgeddings for a simple protein with your prott5_embedder.py, but I'm still reciving this error:

python Embedding/prott5_embedder.py --input a.fasta --output a.h5
Using device: cpu
Loading: Rostlab/prot_t5_xl_half_uniref50-enc
########################################
Example sequence: id1
A
########################################
Total number of sequences: 1
Average sequence length: 1.0
Number of sequences >1000: 0
RuntimeError during embedding for id1 (L=1). Try lowering batch size. If single sequence processing does not work, you need more vRAM to process your protein.

############# STATS #############
Total number of embeddings: 0
Traceback (most recent call last):
  File "Embedding/prott5_embedder.py", line 184, in <module>
    main()
  File "Embedding/prott5_embedder.py", line 181, in main
    get_embeddings( seq_path, emb_path, model_dir, per_protein=per_protein )
  File "Embedding/prott5_embedder.py", line 140, in get_embeddings
    end-start, (end-start)/len(emb_dict), avg_length))
ZeroDivisionError: float division by zero

I went down from a full protein sequence to just one amino-acid, but with no avail. Final a.fasta content:

>id1
A

Am I doing something wrong?

wojciech-galan commented 1 year ago

The issue disappeared when I installed pytorch with GPU support...

mheinzinger commented 1 year ago

Ok, sorry for the inconvenience. I never considered running LMs not on GPUs, so I guess I never encountered this. Thanks for reporting.

wojciech-galan commented 1 year ago

I never considered running LMs not on GPUs either, but I thought I won't need GPU support to extract embeddings. Anyway, problem solved.