Trouble using Multi-GPU on cluster

idmjky commented 3 years ago

Hi, I am trying to utilize multiple GPU to inference. However, when I run it I encounter an error like this. Traceback (most recent call last): File "main.py", line 10, in embeddings = bio_trans.compute_embeddings(sequences, pool_mode=('cls','mean')) File "/om2/user/kaiyi/anaconda/envs/bio-transformers/lib/python3.7/site-packages/biotransformers/wrappers/transformers_wrappers.py", line 669, in computeembeddings , batch_embeddings = self._model_pass(batch_inputs) File "/om2/user/kaiyi/anaconda/envs/bio-transformers/lib/python3.7/site-packages/biotransformers/wrappers/esm_wrappers.py", line 141, in _model_pass repr_layers=[self.repr_layers], File "/om2/user/kaiyi/anaconda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/om2/user/kaiyi/anaconda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/om2/user/kaiyi/anaconda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/om2/user/kaiyi/anaconda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/om2/user/kaiyi/anaconda/envs/bio-transformers/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in replica 1 on device 1. Original Traceback (most recent call last): File "/om2/user/kaiyi/anaconda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, *kwargs) File "/om2/user/kaiyi/anaconda/envs/bio-transformers/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) TypeError: forward() missing 1 required positional argument: 'tokens'

Can you help me look at this issue please? Thanks

delfosseaurelien commented 3 years ago

Hello, can you send the main commands you launched and a sample a sequence data you use?

Thanks

idmjky commented 3 years ago

Here is the python script, it is basically the same as the quick startup on the read me page.

from biotransformers import BioTransformers import numpy as np

sequences = [ "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG", "KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE", ]

bio_trans = BioTransformers(backend="esm1b_t33_650M_UR50S",multi_gpu=True) embeddings = bio_trans.compute_embeddings(sequences, pool_mode=('cls','mean'))

mean_emb = embeddings['mean'] print(mean_emb) print(mean_emb.shape)

a=np.transpose(mean_emb)

np.savetxt('output.csv',a,delimiter=",")

delfosseaurelien commented 3 years ago

Ok, can you tell me your pytorch version and cuda driver?

idmjky commented 3 years ago

Conda torch 1.8.1 NVIDIA-SMI Driver Version: 460.67 CUDA Version: 11.2 Conda cudatoolkit 11.0.221

delfosseaurelien commented 3 years ago

Ok, sorry for this, I will update the example and fix this. The problem comes from the batch_size. Put batch_size=2 and it should work. I will put a check

DeepChainBio / bio-transformers

Trouble using Multi-GPU on cluster #16