deepchem / deepchem

Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology
https://deepchem.io/
MIT License
5.48k stars 1.68k forks source link

Use code to extract protein features, code error #1650

Closed zhouhao-learning closed 4 years ago

zhouhao-learning commented 5 years ago

I have an unexpected error when extracting protein features. My code is as follows:

import deepchem as dc
pdbbind_tasks, pdbbind_datasets, transformers = dc.molnet.load_pdbbind(featurizer="grid", split="r")

But in the middle I got the following error:

RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/deepchem/feat/base_classes.py", line 16, in _featurize_complex
    return featurizer._featurize_complex(mol_pdb_file, protein_pdb_file)
  File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/deepchem/feat/rdkit_grid_featurizer.py", line 1243, in _featurize_complex
    protein_pdb_file, calc_charges=True, sanitize=self.sanitize)
  File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/deepchem/utils/rdkit_util.py", line 128, in load_molecule
    my_mol = add_hydrogens_to_mol(my_mol)
  File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/deepchem/utils/rdkit_util.py", line 52, in add_hydrogens_to_mol
    fixer.addMissingHydrogens(7.4)
  File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/pdbfixer/pdbfixer.py", line 1019, in addMissingHydrogens
    modeller.addHydrogens(pH=pH)
  File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/simtk/openmm/app/modeller.py", line 901, in addHydrogens
    LocalEnergyMinimizer.minimize(context, 1.0, 50)
  File "/home/zh/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/simtk/openmm/openmm.py", line 2631, in minimize
    return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations)
Exception: Particle coordinate is nan
"""

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
<ipython-input-20-499bae175249> in <module>
----> 1 pdbbind_tasks, pdbbind_datasets, transformers = dc.molnet.load_pdbbind(featurizer="grid", split="r")

~/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/deepchem/molnet/load_function/pdbbind_datasets.py in load_pdbbind(featurizer, load_binding_pocket, split, subset, reload)
    276   print("Featurizing Complexes")
    277   features, failures = featurizer.featurize_complexes(ligand_files,
--> 278                                                       protein_files)
    279   # Delete labels for failing elements
    280   labels = np.delete(labels, failures)

~/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/deepchem/feat/base_classes.py in featurize_complexes(self, mol_files, protein_pdbs)
     51     failures = []
     52     for ind, result in enumerate(results):
---> 53       new_features = result.get()
     54       # Handle loading failures which return None
     55       if new_features is not None:

~/sda3/Anaconda3/envs/deep2.0.0/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

Exception: Particle coordinate is nan

Is this a problem with your code or a problem with the data you provided? How should I solve this problem? Please give me some guidance, thank you

peastman commented 5 years ago

Does this happen consistently? Or if you rerun it does it work correctly?

Here's what the stack trace shows. After loading one of the molecules, it called PDBFixer to add hydrogens that were missing from the PDB file. It does that by first adding the hydrogens at more or less random positions, then doing an energy minimization to move them into more physically realistic positions. If two hydrogens randomly happened to get put at almost exactly the same position, that might potentially lead the minimization to produce a nan. If so, this should just be a rare, random event. But if it happens consistently, we need to investigate further.

zhouhao-learning commented 5 years ago

@peastman Sorry, I ran again, but I still got this error

zhouhao-learning commented 5 years ago

@peastman Can you solve this problem, can I update the code?

peastman commented 5 years ago

I just tried running this, and everything worked correctly for me. Note that there's an error in your code: split="r" is not a legal option. I assume you meant split="random"? Anyway, it ran for me and successfully loaded and featurized the datasets.

That's with the very most recent code for DeepChem, PDBFixer, and OpenMM. What versions do you have? It's conceivable the problem you're seeing is caused by a bug in one of them that has since been fixed.

zhouhao-learning commented 5 years ago

my Deepchem version is 2.1.1, I don't know why, no matter how many times I execute the code, I always get this error.

peastman commented 5 years ago

And what about PDBFixer and OpenMM?

Try upgrading all of them to the latest versions.

zhouhao-learning commented 5 years ago

@peastman I updated PDBFixer and OpenMM, PDBFixer==1.5, OpenMM==7.3.1, both of which are the latest versions, then I try to run the code again and still get this error.

rbharath commented 4 years ago

If you're still having issues, check out the updated tutorials. I've got an tutorial up with pdbbind and grid features now. I'm going to close this issue, but please re-open if you're still facing issues!