Open pengjk689 opened 1 year ago
here my code: tmvec-build-database \ --input-fasta human_swissProt.fa \ --tm-vec-model ${path}/tm_vec_cath_model.ckpt \ --tm-vec-config-path ${path}/tm_vec_cath_model_params.json \ --output human_swissProt_database \ --protrans-model ${path}/prot_t5_xl_uniref50 \ --device 'gpu' \
Hi, we cannot currently do that from the CLI -- you'll need to batch it into smaller chunks, encode and create the database from the encodings (otherwise you'll cram too much into gpu memory). We'll try to streamline this in a follow up release
Dear,
When I attempt to build the tmvec database using the entire human protein data from the UniProt database (208,022 entries), I encounter the following issue:
Traceback (most recent call last): File "/home/pengjiak/miniconda3/envs/tmvec/bin/tmvec-build-database", line 110, in
encoded_database = encode(flat_seqs, model_deep, model, tokenizer, device)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/tm_vec/tm_vec_utils.py", line 61, in encode
protrans_sequence = featurize_prottrans(sequences[i:i+1], model, tokenizer, device)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/tm_vec/tm_vec_utils.py", line 24, in featurize_prottrans
embedding = model(input_ids=input_ids, attention_mask=attention_mask)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1964, in forward
encoder_outputs = self.encoder(
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1123, in forward
layer_outputs = layer_module(
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 695, in forward
self_attention_outputs = self.layer[0](
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 602, in forward
attention_output = self.SelfAttention(
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 552, in forward
position_bias = position_bias + mask # (batch_size, n_heads, seq_length, key_length)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 25.09 GiB (GPU 0; 79.18 GiB total capacity; 55.53 GiB already allocated; 22.96 GiB free; 55.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
It's worth noting that I encounter this issue even when attempting to build the database with 20,000 protein FASTA files, which is quite perplexing. Looking forward to your response.