Mariewelt / OpenChem

OpenChem: Deep Learning toolkit for Computational Chemistry and Drug Design Research
https://mariewelt.github.io/OpenChem/
MIT License
681 stars 114 forks source link

Issue with index Embedding layer #10

Open lorenzoFabbri opened 5 years ago

lorenzoFabbri commented 5 years ago

I'm trying to use OpenChem for a classification task. My dataset is basically Tox21 with one label. I'm using a machine with a single GPU.

I simply adapted the provided script for Tox21 but I keep getting the following error:

/opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [67,0,0], thread: [31,0,0] AssertionsrcIndex < srcSelectDimSizefailed.

File "/.../OpenChem/openchem/modules/encoders/rnn_encoder.py", line 90, in init_hidden requires_grad=True).cuda() RuntimeError: CUDA error: device-side assert triggered

Some SMILES were longer than 1024, so I removed them and now the longest one is less than 300 characters. Still, I keep getting the very same error. I read online that these bugs are easier to find when using a CPU. I thus tried to set use_cuda=False in the configuration file. Nonetheless, it still tries to copy everything to the GPU since the error points to the same line (line 90). I then tried to set --use_cuda="False" form the command line but I keep getting the following error:

ValueError: use_cuda has to be of type <class 'bool'>.

I thus set use_cuda to False directly in openchem_encoder.py but then I get RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED. I guess it's using CUDA somewhere else, though. Am I missing something? The Tox21 scripts seem to be working correctly (except for lots of warnings). Thanks.

lorenzoFabbri commented 5 years ago

I was suggested to increase the size of num_embeddings from train_dataset.num_tokens to train_dataset.num_tokens+2 since the maximum of the input tensor was larger than the embedding size. I still do not know why that happened.

I then faced some more issues. I was wondering whether you tested the library on a simple classification task. Since I first had to reshape my labels with reshape(-1, 1) and then I had to modify the cast_inputs module in the Smiles2Label by replacing batch_labels = batch_labels.long() with batch_labels = torch.flatten(batch_labels.long()).