Closed lijiashan2020 closed 2 years ago
Hi, @lijiashan2020. My first question is, are you able to confirm that the directory the prune_pairs.py script is referring to is in fact already created and populated with .pkl files from running make_dataset.py?
Thank you for your reply! I reconfirmed the directory is already created and populated with .pkl files, however, deadlock still happens. I borrowed from another questioner's solution, splitting up to process the make_dataset.py script, first mkdir six different folder and move file in new folder, run make_dataset.py script separately, some new files are generated in the 'pairs' and 'complexes' folder that did not exist before, then a new error appears as follow:
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/multiprocess/pool.py", line 48, in mapstar
return list(map(*args))
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
func = lambda args: f(*args)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/parallel.py", line 85, in submit_helper
raise e
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/parallel.py", line 79, in submit_helper
return function(*inputs)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/atom3/pair.py", line 98, in complex_to_pairs
pairs, num_subunits = get_pairs(complex)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/atom3/pair.py", line 141, in get_pairs_param
return get_pairs(neighbor_def, complex, type, unbound, nb_fn, full)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/atom3/pair.py", line 156, in get_pairs
_get_rcsb_pairs(neighbor_def, complex, unbound, nb_fn, full)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/atom3/pair.py", line 190, in _get_rcsb_pairs
df = pd.read_pickle(pkl_filename)
File "/home/jiashan/.local/lib/python3.9/site-packages/pandas/io/pickle.py", line 222, in read_pickle
return pc.load(handles.handle, encoding=None)
File "/home/jiashan/.local/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 274, in load
return up.load()
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/pickle.py", line 1210, in load
dispatch[key[0]](self)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/pickle.py", line 1535, in load_stack_global
self.append(self.find_class(module, name))
File "/home/jiashan/.local/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 206, in find_class
return super().find_class(module, name)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/pickle.py", line 1579, in find_class
return _getattribute(sys.modules[module], name)[0]
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/pickle.py", line 331, in _getattribute
raise AttributeError("Can't get attribute {!r} on {!r}"
AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from '/home/jiashan/.local/lib/python3.9/site-packages/pandas/_libs/internals.cpython-39-x86_64-linux-gnu.so'>
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/extendplus/jiashan/DIPS_plus/project/datasets/builder/make_dataset.py", line 54, in <module>
main()
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/extendplus/jiashan/DIPS_plus/project/datasets/builder/make_dataset.py", line 47, in main
pair.all_complex_to_pairs(complexes, source_type, get_pairs, pairs_dir, num_cpus)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/atom3/pair.py", line 82, in all_complex_to_pairs
par.submit_jobs(complex_to_pairs, inputs, num_cpus)
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/parallel.py", line 60, in submit_jobs
out = res.get()
File "/extendplus/app/miniconda3/envs/py39/lib/python3.9/site-packages/multiprocess/pool.py", line 771, in get
raise self._value
AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from '/home/jiashan/.local/lib/python3.9/site-packages/pandas/_libs/internals.cpython-39-x86_64-linux-gnu.so'>
Could you please help me with this error? Thanks!
@lijiashan2020,
I'm glad to hear you were able to progress through this issue further! I believe the error you are seeing comes from version incompatibility between Pandas and your Python version's Pickle module (each version of Python has its own version of Pickle, I believe). I would first check to make sure you are using a compatible version of Pandas for Python 3.8 (see this post for reference: https://stackoverflow.com/a/71090354). You may need to downgrade Pandas a few versions such that the Pickle module Pandas uses can correctly load the Pickle files Python is creating in make_dataset.py
.
@lijiashan2020,
In particular, I would recommend that you try to downgrade Pandas
to version 1.2.4, to see if this fixes what I believe to be an incompatibility between Python 3.8's Pickle module and the latest version of Pandas.
Thank you very much for your reply! I will reinstall the environment according to the module version you recommended, your open source work has benefited me a lot. Thank you again for your help!
@lijiashan2020,
Let me know if this works for you. I am happy to help where I can!
I'm very happy to tell you that after downgrading the python version, the program will not continue to report errors! But when it is about to run successfully, the program will suddenly Aborted!
2022-04-06 22:38:28,877 INFO 86949: For complex 5pm8.pdb1 found 0 pairs out of 1 chains
2022-04-06 22:38:28,877 INFO 86949: Working on 5pmn.pdb1
2022-04-06 22:38:28,888 INFO 86949: For complex 5pmn.pdb1 found 0 pairs out of 1 chains
2022-04-06 22:38:28,888 INFO 86949: Working on 5pmj.pdb1
2022-04-06 22:38:28,894 INFO 86949: For complex 5pmj.pdb1 found 0 pairs out of 1 chains
Aborted!
I am very sorry to disturb you with many problems, and I will also try my best to solve these problems. Thanks!
@lijiashan2020,
This sounds like a core dump happened somewhere within your Python script's execution. I also noticed that your script is looking at processing pairs for single amino acid chains, which at least at first glance does not seem to make sense to me. Typically, as I recall, this script would be looking for at least one pair for each collection of chains (possibly two or more chains). Seeing it find only one chain in your "complexes" makes me suspect that some previous data processing did not complete successfully. I would recommend, with your new version of Pandas, rerunning the entire data processing pipeline (if possible) to ensure that the version of Pandas you used before did not result in unexpected (incorrect) processing of each RCSB protein complex. I hope this information helps.
Thank you very much for your recent help! I found that I didn't delete the previously generated result file when I rerun, which made it invalid even if I divided the data into several parts. I can run successfully now!
@lijiashan2020,
I am glad to hear it!
7
I have a similar problem, when I run the command
It seems that it can run successfully and generate all PDB's .pkl file, however, it will not continue to run and no error will be reported. This command seems to require generating project/datasets/DIPS/interim/pairs, which will be uesd in the next command. When I run the command,
the result is as follows:
What can I do for this?
thanks