Closed luwei0917 closed 2 years ago
Hi, thanks for the issue! Since you are directly writing and then reading the pickle again, I am rather sure that this problem is not related to the repository. It seems like this is rather a general issue with pickle and your concrete setup.
Let me know if I am misunderstanding something!
Hello, it is a problem arises while running "python train.py --config=configs_clean/RDKitCoords_flexible_self_docking.yml",
Get receptors: 4980it [11:42, 1.40it/s][Parallel(n_jobs=20)]: Done 4960 tasks | elapsed: 11.7min Get receptors: 5280it [12:01, 14.33it/s]exception calling callback for <Future at 0x7fb376545610 state=finished raised BrokenProcessPool> joblib.externals.loky.process_executor._RemoteTraceback: ... RuntimeError: invalid value in pickle
And I trace the problem to "receptor_representatives = pmap_multi(get_receptor, zip(rec_paths, ligs), n_jobs=self.n_jobs, cutoff=self.chain_radius, desc='Get receptors') " with further testing, I find that the problem only occur for "3m1s". I think the real problem is joblib Parallel want to pickle the ligand in RDkit mol class.
If you could run through "3m1s" without problem, it might due to our difference in Rdkit version.
Understood, thanks for the clarification. However, the get_receptor function only reads the receptor files. Can you maybe check if the ligand that is returned by rdkit is None?
yes, I checked by adding print('hello') right after "def get_receptor(rec_path, lig, cutoff):". it will print 'hello' as expected until it reaches "3m1s". so I think joblib Parallel want to pickle the input.and that causes the error.
I understand, but can you maybe check with the script that you wrote above if object that is returned by the function read_molecule
is None?
it's not None. RDkit can read the mol2 file.
Okay, then it is not a problem with RDKit and the RDKit version I think.
Could you maybe let me know the full error message?
Usually it prints more than just the final joblib.externals.loky.process_executor._RemoteTraceback: ... RuntimeError: invalid value in pickle
and that information might be helpful
Sure.
Get receptors: 10340it [25:17, 15.49it/s]exception calling callback for <Future at 0x7fb5adb2ca90 state=finished raised BrokenProcessPool>
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 407, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
RuntimeError: invalid value in pickle
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
self.parallel.dispatch_next()
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 794, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit
fn, *args, **kwargs)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1115, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Get receptors: 10360it [25:32, 19.78it/s]joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 407, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
RuntimeError: invalid value in pickle
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 304, in <module>
main_function()
File "train.py", line 295, in main_function
train_wrapper(args)
File "train.py", line 141, in train_wrapper
return train(args, run_dir)
File "train.py", line 169, in train
train_data = PDBBind(device=device, complex_names_path=args.train_names,lig_predictions_name=args.train_predictions_name, is_train_data=True, **args.dataset_params)
File "/gxr/luwei/packages/EquiBind/datasets/pdbbind.py", line 128, in __init__
self.process()
File "/gxr/luwei/packages/EquiBind/datasets/pdbbind.py", line 271, in process
receptor_representatives = pmap_multi(get_receptor, zip(rec_paths, ligs), n_jobs=self.n_jobs, cutoff=self.chain_radius, desc='Get receptors')
File "/gxr/luwei/packages/EquiBind/commons/utils.py", line 47, in pmap_multi
delayed(pickleable_fn)(*d, **kwargs) for i, d in tqdm(enumerate(data),desc=desc)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 1056, in __call__
self.retrieve()
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 935, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 359, in __call__
self.parallel.dispatch_next()
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 794, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit
fn, *args, **kwargs)
File "/gxr/luwei/anaconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1115, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Get receptors: 10399it [31:33, 5.49it/s]
Sorry, I am not sure what the cause is here. One thing that would definitely work is running the get_receptor function without joblib. (Which would require rewriting the line of code)
Also, I found some posts of users circumventing similar problems by using different versions of joblib.
Can you maybe try:
conda install -c anaconda joblib=1.1.0
conda install -c anaconda joblib=1.2.0
Thanks for the wonderful work!
I solved the problem by replacing following codes in process_mol.py
:
coords = lig.GetConformer().GetPositions()==>coords = get_rdkit_coords(lig).numpy()
and removing some training complexes:
3zs1 3p3h 4jfv 4acu 3w8o 4mdn 2jld 3m1s 3p3j 4jfw 5tgy 4dcy 3p44 4i60 4jhq
and validation complexes:
4xkc 3p55 3rj7
Great to hear! Thanks for the information that some versions have issues with the ligands in the mentioned complexes.
I am hitting analogous issue. Could you please provide full list of versions of libraries you are using? The env file has version only of 3 out of many used ones. In particular version of rdkit could be helpful.
Sorry for the late reply, I somehow missed your question @lejlot .
Here is a list of versions:
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
absl-py 1.0.0 pypi_0 pypi
asttokens 2.0.5 pypi_0 pypi
beautifulsoup4 4.10.0 pypi_0 pypi
biopandas 0.2.9 pypi_0 pypi
blas 1.0 mkl
bottleneck 1.3.2 py37heb32a55_1
bzip2 1.0.8 h7b6447c_0
ca-certificates 2021.10.26 h06a4308_2
cachetools 4.2.4 pypi_0 pypi
cairo 1.16.0 hf32fb01_1
certifi 2021.10.8 py37h06a4308_0
charset-normalizer 2.0.7 pypi_0 pypi
cloudpickle 2.0.0 pypi_0 pypi
colorama 0.4.4 pypi_0 pypi
cudatoolkit 10.2.89 hfd86e86_1
cycler 0.11.0 pypi_0 pypi
dgl-cuda10.2 0.7.2 py37_0 dglteam
dgllife 0.2.8 pypi_0 pypi
executing 0.8.2 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.4.0 pypi_0 pypi
fontconfig 2.13.1 h6c09931_0
fonttools 4.28.1 pypi_0 pypi
freetype 2.11.0 h70c0345_0
future 0.18.2 pypi_0 pypi
gdown 4.2.0 pypi_0 pypi
giflib 5.2.1 h7b6447c_0
glib 2.69.1 h5202010_0
gmp 6.2.1 h2531618_2
gnutls 3.6.15 he1e5248_0
google-auth 2.3.3 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
grpcio 1.42.0 pypi_0 pypi
hyperopt 0.2.7 pypi_0 pypi
icecream 2.1.1 pypi_0 pypi
icu 58.2 he6710b0_3
idna 3.3 pypi_0 pypi
importlib-metadata 4.8.2 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561
joblib 1.1.0 pypi_0 pypi
jpeg 9d h7f8727e_0
kiwisolver 1.3.2 pypi_0 pypi
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libboost 1.73.0 h3ff78a5_11
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 9.3.0 h5101ec6_17
libiconv 1.15 h63c8f33_5
libidn2 2.3.2 h7f8727e_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 hd4cf53a_17
libtasn1 4.16.0 h27cfd23_0
libtiff 4.2.0 h85742a9_0
libunistring 0.9.10 h27cfd23_0
libuuid 1.0.3 h7f8727e_2
libuv 1.40.0 h7b6447c_0
libwebp 1.2.0 h89dd481_0
libwebp-base 1.2.0 h27cfd23_0
libxcb 1.14 h7b6447c_0
libxml2 2.9.12 h03d6c58_0
lz4-c 1.9.3 h295c915_1
markdown 3.3.6 pypi_0 pypi
matplotlib 3.5.0 pypi_0 pypi
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py37h7f8727e_0
mkl_fft 1.3.1 py37hd3c417c_0
mkl_random 1.2.2 py37h51133e4_0
ncurses 6.3 h7f8727e_2
nettle 3.7.3 hbbd107a_1
networkx 2.6.3 pyhd3eb1b0_0
numexpr 2.7.3 py37h22e1b3c_1
numpy 1.21.2 py37h20f2e39_0
numpy-base 1.21.2 py37h79a1101_0
oauthlib 3.1.1 pypi_0 pypi
olefile 0.46 py37_0
openh264 2.1.0 hd408876_0
openssl 1.1.1l h7f8727e_0
packaging 21.3 pypi_0 pypi
pandas 1.3.4 py37h8c16a72_0
pcre 8.45 h295c915_0
pillow 8.4.0 py37h5aabda8_0
pip 21.0.1 py37h06a4308_0
pixman 0.40.0 h7f8727e_1
pot 0.8.0 pypi_0 pypi
protobuf 3.19.1 pypi_0 pypi
py-boost 1.73.0 py37ha9443f7_11
py4j 0.10.9.2 pypi_0 pypi
pyaml 21.10.1 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pygments 2.10.0 pypi_0 pypi
pyparsing 3.0.6 pypi_0 pypi
pysocks 1.7.1 pypi_0 pypi
python 3.7.11 h12debd9_0
python-dateutil 2.8.2 pyhd3eb1b0_0
pytorch 1.10.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2021.3 pyhd3eb1b0_0
pyyaml 6.0 pypi_0 pypi
rdkit 2020.09.1.0 py37hd50e099_1 rdkit
readline 8.1 h27cfd23_0
requests 2.26.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rsa 4.7.2 pypi_0 pypi
scikit-learn 1.0.1 pypi_0 pypi
scipy 1.7.1 py37h292c36d_2
setuptools 58.0.4 py37h06a4308_0
setuptools-scm 6.3.2 pypi_0 pypi
six 1.16.0 pyhd3eb1b0_0
soupsieve 2.3.1 pypi_0 pypi
sqlite 3.36.0 hc218d9a_0
tensorboard 2.7.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.0 pypi_0 pypi
threadpoolctl 3.0.0 pypi_0 pypi
tk 8.6.11 h1ccaba5_0
tomli 1.2.2 pypi_0 pypi
torchaudio 0.10.0 py37_cu102 pytorch
torchvision 0.11.1 py37_cu102 pytorch
tqdm 4.62.3 pypi_0 pypi
typing_extensions 3.10.0.2 pyh06a4308_0
urllib3 1.26.7 pypi_0 pypi
werkzeug 2.0.2 pypi_0 pypi
wheel 0.37.0 pyhd3eb1b0_1
xz 5.2.5 h7b6447c_0
zipp 3.6.0 pypi_0 pypi
zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0
It could be a problem with RDKit. My version is "rdkit 2021.09.4".
RuntimeError: invalid value in pickle