HannesStark / EquiBind

EquiBind: geometric deep learning for fast predictions of the 3D structure in which a small molecule binds to a protein
MIT License
473 stars 109 forks source link

joblib Parallel issue with specific complex '3m1s' in the PDBBind data #6

Closed luwei0917 closed 2 years ago

luwei0917 commented 2 years ago

It could be a problem with RDKit. My version is "rdkit 2021.09.4".

import pickle
pdbbind_dir = "PDBBind_processed/"
name = '3m1s'
lig = read_molecule(os.path.join(pdbbind_dir, name, f'{name}_ligand.sdf'), sanitize=True,
                    remove_hs=True)
if lig == None:  # read mol2 file if sdf file cannot be sanitized
    lig = read_molecule(os.path.join(pdbbind_dir, name, f'{name}_ligand.mol2'), sanitize=True,
                        remove_hs=True)
# lig = Chem.MolFromSmiles('O=C[Ru+9]12345(C6=C1C2C3=C64)n1c2ccc(O)cc2c2c3c(c4ccc[n+]5c4c21)C(=O)NC3=O')
pickle.dump(lig, open("test.pkl", "bw"))
pickle.load(open("test.pkl", "rb"))

RuntimeError: invalid value in pickle

HannesStark commented 2 years ago

Hi, thanks for the issue! Since you are directly writing and then reading the pickle again, I am rather sure that this problem is not related to the repository. It seems like this is rather a general issue with pickle and your concrete setup.

Let me know if I am misunderstanding something!

luwei0917 commented 2 years ago

Hello, it is a problem arises while running "python train.py --config=configs_clean/RDKitCoords_flexible_self_docking.yml",

Get receptors: 4980it [11:42, 1.40it/s][Parallel(n_jobs=20)]: Done 4960 tasks | elapsed: 11.7min Get receptors: 5280it [12:01, 14.33it/s]exception calling callback for <Future at 0x7fb376545610 state=finished raised BrokenProcessPool> joblib.externals.loky.process_executor._RemoteTraceback: ... RuntimeError: invalid value in pickle

And I trace the problem to "receptor_representatives = pmap_multi(get_receptor, zip(rec_paths, ligs), n_jobs=self.n_jobs, cutoff=self.chain_radius, desc='Get receptors') " with further testing, I find that the problem only occur for "3m1s". I think the real problem is joblib Parallel want to pickle the ligand in RDkit mol class.

If you could run through "3m1s" without problem, it might due to our difference in Rdkit version.

HannesStark commented 2 years ago

Understood, thanks for the clarification. However, the get_receptor function only reads the receptor files. Can you maybe check if the ligand that is returned by rdkit is None?

luwei0917 commented 2 years ago

yes, I checked by adding print('hello') right after "def get_receptor(rec_path, lig, cutoff):". it will print 'hello' as expected until it reaches "3m1s". so I think joblib Parallel want to pickle the input.and that causes the error.

HannesStark commented 2 years ago

I understand, but can you maybe check with the script that you wrote above if object that is returned by the function read_molecule is None?

luwei0917 commented 2 years ago

it's not None. RDkit can read the mol2 file.

HannesStark commented 2 years ago

Okay, then it is not a problem with RDKit and the RDKit version I think.

Could you maybe let me know the full error message? Usually it prints more than just the final joblib.externals.loky.process_executor._RemoteTraceback: ... RuntimeError: invalid value in pickle and that information might be helpful

luwei0917 commented 2 years ago

Sure.

Get receptors: 10340it [25:17, 15.49it/s]exception calling callback for <Future at 0x7fb5adb2ca90 state=finished raised BrokenProcessPool>
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 407, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
RuntimeError: invalid value in pickle
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
    callback(self)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
    self.parallel.dispatch_next()
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 794, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
    self._dispatch(tasks)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 779, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
    future = self._workers.submit(SafeFunction(func))
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit
    fn, *args, **kwargs)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1115, in submit
    raise self._flags.broken
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Get receptors: 10360it [25:32, 19.78it/s]joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 407, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
RuntimeError: invalid value in pickle
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 304, in <module>
    main_function()
  File "train.py", line 295, in main_function
    train_wrapper(args)
  File "train.py", line 141, in train_wrapper
    return train(args, run_dir)
  File "train.py", line 169, in train
    train_data = PDBBind(device=device, complex_names_path=args.train_names,lig_predictions_name=args.train_predictions_name, is_train_data=True, **args.dataset_params)
  File "/gxr/luwei/packages/EquiBind/datasets/pdbbind.py", line 128, in __init__
    self.process()
  File "/gxr/luwei/packages/EquiBind/datasets/pdbbind.py", line 271, in process
    receptor_representatives = pmap_multi(get_receptor, zip(rec_paths, ligs), n_jobs=self.n_jobs, cutoff=self.chain_radius, desc='Get receptors')
  File "/gxr/luwei/packages/EquiBind/commons/utils.py", line 47, in pmap_multi
    delayed(pickleable_fn)(*d, **kwargs) for i, d in tqdm(enumerate(data),desc=desc)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 1056, in __call__
    self.retrieve()
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 935, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
    callback(self)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
    self.parallel.dispatch_next()
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 794, in dispatch_next
    if not self.dispatch_one_batch(self._original_iterator):
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
    self._dispatch(tasks)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 779, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
    future = self._workers.submit(SafeFunction(func))
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit
    fn, *args, **kwargs)
  File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1115, in submit
    raise self._flags.broken
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Get receptors: 10399it [31:33,  5.49it/s]
HannesStark commented 2 years ago

Sorry, I am not sure what the cause is here. One thing that would definitely work is running the get_receptor function without joblib. (Which would require rewriting the line of code)

Also, I found some posts of users circumventing similar problems by using different versions of joblib. Can you maybe try: conda install -c anaconda joblib=1.1.0 conda install -c anaconda joblib=1.2.0

gaozhangyang commented 2 years ago

Thanks for the wonderful work!

I solved the problem by replacing following codes in process_mol.py:

coords = lig.GetConformer().GetPositions()==>coords = get_rdkit_coords(lig).numpy()

and removing some training complexes:

3zs1 3p3h 4jfv 4acu 3w8o 4mdn 2jld 3m1s 3p3j 4jfw 5tgy 4dcy 3p44 4i60 4jhq

and validation complexes:

4xkc 3p55 3rj7

HannesStark commented 2 years ago

Great to hear! Thanks for the information that some versions have issues with the ligands in the mentioned complexes.

lejlot commented 2 years ago

I am hitting analogous issue. Could you please provide full list of versions of libraries you are using? The env file has version only of 3 out of many used ones. In particular version of rdkit could be helpful.

HannesStark commented 2 years ago

Sorry for the late reply, I somehow missed your question @lejlot . Here is a list of versions: _libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
absl-py 1.0.0 pypi_0 pypi asttokens 2.0.5 pypi_0 pypi beautifulsoup4 4.10.0 pypi_0 pypi biopandas 0.2.9 pypi_0 pypi blas 1.0 mkl
bottleneck 1.3.2 py37heb32a55_1
bzip2 1.0.8 h7b6447c_0
ca-certificates 2021.10.26 h06a4308_2
cachetools 4.2.4 pypi_0 pypi cairo 1.16.0 hf32fb01_1
certifi 2021.10.8 py37h06a4308_0
charset-normalizer 2.0.7 pypi_0 pypi cloudpickle 2.0.0 pypi_0 pypi colorama 0.4.4 pypi_0 pypi cudatoolkit 10.2.89 hfd86e86_1
cycler 0.11.0 pypi_0 pypi dgl-cuda10.2 0.7.2 py37_0 dglteam dgllife 0.2.8 pypi_0 pypi executing 0.8.2 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 pytorch filelock 3.4.0 pypi_0 pypi fontconfig 2.13.1 h6c09931_0
fonttools 4.28.1 pypi_0 pypi freetype 2.11.0 h70c0345_0
future 0.18.2 pypi_0 pypi gdown 4.2.0 pypi_0 pypi giflib 5.2.1 h7b6447c_0
glib 2.69.1 h5202010_0
gmp 6.2.1 h2531618_2
gnutls 3.6.15 he1e5248_0
google-auth 2.3.3 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi grpcio 1.42.0 pypi_0 pypi hyperopt 0.2.7 pypi_0 pypi icecream 2.1.1 pypi_0 pypi icu 58.2 he6710b0_3
idna 3.3 pypi_0 pypi importlib-metadata 4.8.2 pypi_0 pypi intel-openmp 2021.4.0 h06a4308_3561
joblib 1.1.0 pypi_0 pypi jpeg 9d h7f8727e_0
kiwisolver 1.3.2 pypi_0 pypi lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libboost 1.73.0 h3ff78a5_11
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 9.3.0 h5101ec6_17
libiconv 1.15 h63c8f33_5
libidn2 2.3.2 h7f8727e_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 hd4cf53a_17
libtasn1 4.16.0 h27cfd23_0
libtiff 4.2.0 h85742a9_0
libunistring 0.9.10 h27cfd23_0
libuuid 1.0.3 h7f8727e_2
libuv 1.40.0 h7b6447c_0
libwebp 1.2.0 h89dd481_0
libwebp-base 1.2.0 h27cfd23_0
libxcb 1.14 h7b6447c_0
libxml2 2.9.12 h03d6c58_0
lz4-c 1.9.3 h295c915_1
markdown 3.3.6 pypi_0 pypi matplotlib 3.5.0 pypi_0 pypi mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py37h7f8727e_0
mkl_fft 1.3.1 py37hd3c417c_0
mkl_random 1.2.2 py37h51133e4_0
ncurses 6.3 h7f8727e_2
nettle 3.7.3 hbbd107a_1
networkx 2.6.3 pyhd3eb1b0_0
numexpr 2.7.3 py37h22e1b3c_1
numpy 1.21.2 py37h20f2e39_0
numpy-base 1.21.2 py37h79a1101_0
oauthlib 3.1.1 pypi_0 pypi olefile 0.46 py37_0
openh264 2.1.0 hd408876_0
openssl 1.1.1l h7f8727e_0
packaging 21.3 pypi_0 pypi pandas 1.3.4 py37h8c16a72_0
pcre 8.45 h295c915_0
pillow 8.4.0 py37h5aabda8_0
pip 21.0.1 py37h06a4308_0
pixman 0.40.0 h7f8727e_1
pot 0.8.0 pypi_0 pypi protobuf 3.19.1 pypi_0 pypi py-boost 1.73.0 py37ha9443f7_11
py4j 0.10.9.2 pypi_0 pypi pyaml 21.10.1 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pygments 2.10.0 pypi_0 pypi pyparsing 3.0.6 pypi_0 pypi pysocks 1.7.1 pypi_0 pypi python 3.7.11 h12debd9_0
python-dateutil 2.8.2 pyhd3eb1b0_0
pytorch 1.10.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2021.3 pyhd3eb1b0_0
pyyaml 6.0 pypi_0 pypi rdkit 2020.09.1.0 py37hd50e099_1 rdkit readline 8.1 h27cfd23_0
requests 2.26.0 pypi_0 pypi requests-oauthlib 1.3.0 pypi_0 pypi rsa 4.7.2 pypi_0 pypi scikit-learn 1.0.1 pypi_0 pypi scipy 1.7.1 py37h292c36d_2
setuptools 58.0.4 py37h06a4308_0
setuptools-scm 6.3.2 pypi_0 pypi six 1.16.0 pyhd3eb1b0_0
soupsieve 2.3.1 pypi_0 pypi sqlite 3.36.0 hc218d9a_0
tensorboard 2.7.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.0 pypi_0 pypi threadpoolctl 3.0.0 pypi_0 pypi tk 8.6.11 h1ccaba5_0
tomli 1.2.2 pypi_0 pypi torchaudio 0.10.0 py37_cu102 pytorch torchvision 0.11.1 py37_cu102 pytorch tqdm 4.62.3 pypi_0 pypi typing_extensions 3.10.0.2 pyh06a4308_0
urllib3 1.26.7 pypi_0 pypi werkzeug 2.0.2 pypi_0 pypi wheel 0.37.0 pyhd3eb1b0_1
xz 5.2.5 h7b6447c_0
zipp 3.6.0 pypi_0 pypi zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0