HannesStark / EquiBind

EquiBind: geometric deep learning for fast predictions of the 3D structure in which a small molecule binds to a protein
MIT License
469 stars 110 forks source link

invalid value in pickle #39

Closed yufengwhy closed 2 years ago

yufengwhy commented 2 years ago

loading ligands: 100%|##########| 16379/16379 [00:07<00:00, 2120.18it/s] [2022-06-20 17:45:06.850970] Get receptors, filter chains, and get its coordinates Get receptors: 0it [00:00, ?it/s][Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers. Get receptors: 40it [00:02, 15.94it/s][Parallel(n_jobs=20)]: Done 10 tasks | elapsed: 3.3s Get receptors: 180it [00:18, 9.37it/s][Parallel(n_jobs=20)]: Done 160 tasks | elapsed: 19.3s Get receptors: 440it [00:48, 7.08it/s][Parallel(n_jobs=20)]: Done 410 tasks | elapsed: 49.5s Get receptors: 780it [01:33, 6.45it/s][Parallel(n_jobs=20)]: Done 760 tasks | elapsed: 1.6min Get receptors: 1240it [02:34, 6.44it/s][Parallel(n_jobs=20)]: Done 1210 tasks | elapsed: 2.6min Get receptors: 1780it [03:40, 9.20it/s][Parallel(n_jobs=20)]: Done 1760 tasks | elapsed: 3.7min Get receptors: 2440it [05:06, 11.94it/s][Parallel(n_jobs=20)]: Done 2410 tasks | elapsed: 5.1min Get receptors: 3180it [06:55, 3.39it/s][Parallel(n_jobs=20)]: Done 3160 tasks | elapsed: 6.9min Get receptors: 4040it [08:31, 11.87it/s][Parallel(n_jobs=20)]: Done 4010 tasks | elapsed: 8.5min Get receptors: 4980it [10:24, 13.37it/s][Parallel(n_jobs=20)]: Done 4960 tasks | elapsed: 10.4min Get receptors: 5280it [11:34, 12.19it/s]exception calling callback for <Future at 0x7f9e7b119c50 state=finished raised BrokenProcessPool> joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 407, in _process_worker call_item = call_queue.get(block=True, timeout=timeout) File "/opt/miniconda3/envs/equibind/lib/python3.7/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) RuntimeError: invalid value in pickle """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks callback(self) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 359, in call self.parallel.dispatch_next() File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 794, in dispatch_next if not self.dispatch_one_batch(self._original_iterator): File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch self._dispatch(tasks) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 779, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async future = self._workers.submit(SafeFunction(func)) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit fn, *args, **kwargs) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1115, in submit raise self._flags.broken joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable. Get receptors: 5300it [11:35, 14.89it/s]joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 407, in _process_worker call_item = call_queue.get(block=True, timeout=timeout) File "/opt/miniconda3/envs/equibind/lib/python3.7/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) RuntimeError: invalid value in pickle """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "train.py", line 304, in main_function() File "train.py", line 295, in main_function train_wrapper(args) File "train.py", line 141, in train_wrapper return train(args, run_dir) File "train.py", line 169, in train train_data = PDBBind(device=device, complex_names_path=args.train_names,lig_predictions_name=args.train_predictions_name, is_train_data=True, args.dataset_params) File "/root/code/EquiBind/datasets/pdbbind.py", line 127, in init self.process() File "/root/code/EquiBind/datasets/pdbbind.py", line 254, in process receptor_representatives = pmap_multi(get_receptor, zip(rec_paths, ligs), n_jobs=self.n_jobs, cutoff=self.chain_radius, desc='Get receptors') File "/root/code/EquiBind/commons/utils.py", line 47, in pmap_multi delayed(pickleable_fn)(*d, *kwargs) for i, d in tqdm(enumerate(data),desc=desc) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 1056, in call self.retrieve() File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 935, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result return future.result(timeout=timeout) File "/opt/miniconda3/envs/equibind/lib/python3.7/concurrent/futures/_base.py", line 435, in result return self.get_result() File "/opt/miniconda3/envs/equibind/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result raise self._exception File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks callback(self) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 359, in call self.parallel.dispatch_next() File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 794, in dispatch_next if not self.dispatch_one_batch(self._original_iterator): File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch self._dispatch(tasks) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/parallel.py", line 779, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 531, in apply_async future = self._workers.submit(SafeFunction(func)) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit fn, args, kwargs) File "/opt/miniconda3/envs/equibind/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1115, in submit raise self._flags.broken joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable. Get receptors: 5339it [13:02, 6.82it/s]

HannesStark commented 2 years ago

Hi, I am not sure whether I can help debugging this. I can only say from experience that I sometimes encountered something like this when running the code from an IDE and something similar when running out of RAM. So maybe monitoring your RAM can also provide insights. If you do not have enough RAM for preprocessing the data, you can split it up into batches instead of preprocessing everything at once and having all the data in memory simultaneously.

yufengwhy commented 2 years ago

rdkit version problem when change to 2020.09.1, it worked fine conda install -c rdkit rdkit https://anaconda.org/rdkit/rdkit

HannesStark commented 2 years ago

fantastic, thanks for the update!