Expecting all tensors to be on same device, but found two device cuda:0 and cpu, when running the generate_confs.py

finalelement commented 2 years ago

Hello,

I am facing an issue when trying to run the generate_confs.py using the given pretrained models. However I am running into the error shared below, please share your insights, if there is a preference between GPU and CPU when trying to run the inference.

I also tried switching between cpu and gpu for the model, but no luck so far.

  0%|          | 0/1000 [02:14<?, ?it/s]
Traceback (most recent call last):
  File "/home/vishwesh/Software/pycharm-community-2021.1.1/plugins/python-ce/helpers/pydev/pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/vishwesh/Software/pycharm-community-2021.1.1/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/vishwesh/Code/geo_mol/GeoMol/generate_confs.py", line 63, in <module>
    model(data, inference=True, n_model_confs=n_confs*2)
  File "/home/vishwesh/anaconda3/envs/geomol_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vishwesh/Code/geo_mol/GeoMol/model/model.py", line 81, in forward
    self.generate_model_prediction(data.x, data.edge_index, data.edge_attr, data.batch, data.chiral_tag)
  File "/home/vishwesh/Code/geo_mol/GeoMol/model/model.py", line 686, in generate_model_prediction
    x1, x2, h_mol = self.embed(x, edge_index, edge_attr, batch)
  File "/home/vishwesh/Code/geo_mol/GeoMol/model/model.py", line 228, in embed
    x1, _ = self.gnn(x, edge_index, edge_attr)
  File "/home/vishwesh/anaconda3/envs/geomol_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vishwesh/Code/geo_mol/GeoMol/model/GNN.py", line 126, in forward
    x = self.node_init(x)
  File "/home/vishwesh/anaconda3/envs/geomol_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vishwesh/Code/geo_mol/GeoMol/model/GNN.py", line 40, in forward
    x = self.layers[i](x)
  File "/home/vishwesh/anaconda3/envs/geomol_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vishwesh/anaconda3/envs/geomol_v2/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/vishwesh/anaconda3/envs/geomol_v2/lib/python3.8/site-packages/torch/nn/functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument mat2 in method wrapper_mm)

Process finished with exit code 1

finalelement commented 2 years ago

Some updates, was able to successfully run generate_confs.py, but had to ensure that all was being put on cpu, ended up making the following change to inference.py, utils.py, and model.py.

#device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = 'cpu'

But I was not able to run it with the default given scripts. Looking forward to insights from y'all. :)

victorl25 commented 2 years ago

I was able to run generate_confs.py on gpu by making a few code modifications:

added device definition line to each source file device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
replaced NumPy with CuPy import cupy as np
added explicit device argument to each array initialization, e.g. p_coords = torch.zeros([4, model.n_model_confs, 3], device=device)
added explicit copy to cpu where it was needed, e.g. q_reorder = np.argsort([np.where(a.cpu() == q_idx.cpu())[0][0] for a in torch.tensor(cycle_avg_indices)[q_coords_mask]])

victorl25 commented 2 years ago

Also made the following changes in generate_confs.py:

state_dict = torch.load(f'{trained_model_dir}/best_model.pt', map_location=device)
model.load_state_dict(state_dict, strict=True)
model.to(device)

data = Batch.from_data_list([tg_data]).to(device)

PattanaikL / GeoMol

Expecting all tensors to be on same device, but found two device cuda:0 and cpu, when running the generate_confs.py #10