chemosim-lab / ProLIF

Interaction Fingerprints for protein-ligand complexes and more
https://prolif.readthedocs.io
Apache License 2.0
372 stars 71 forks source link

Get the following error when I try and run a fingerprint calculation #138

Closed HannibaltheArrow closed 1 year ago

HannibaltheArrow commented 1 year ago

When running the following code;

os.chdir(self.workingDirectory_path)

prot = mda.Universe(PDB_ID + "_cleanV2_H.pdb", guess_bonds = True) prot = plf.Molecule.from_mda(prot, NoImplicit=False) prot.n_residues

path = str(PDB_ID + '_lig_H_ledock_out.sdf') lig_suppl = plf.sdf_supplier(path) fp = plf.Fingerprint() fp.run_from_iterable(lig_suppl, prot) df = fp.to_dataframe()

I get this error message.

Traceback (most recent call last): File "", line 1, in File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/runpy.py", line 288, in run_path return _run_module_code(code, init_globals, run_name, File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Users/KalenJosifovski/Docking_program/Molecular_Docking.py", line 409, in anal.interaction_fingerprint(PDB_ID) File "/Users/KalenJosifovski/Docking_program/Molecular_Docking.py", line 238, in interaction_fingerprint fp.run_from_iterable(lig_suppl, prot) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/site-packages/prolif/fingerprint.py", line 593, in run_from_iterable return self._run_iter_parallel( File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/site-packages/prolif/fingerprint.py", line 628, in _run_iter_parallel with mp.Pool( File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/pool.py", line 212, in init self._repopulate_pool() File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/pool.py", line 326, in _repopulate_pool_static w.start() File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 42, in _launch prep_data = spawn.get_preparation_data(process_obj._name) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/multiprocessing/spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
cbouy commented 1 year ago

@HannibaltheArrow

In your script, right after you're done importing the libraries you need, add the following if __name__ == '__main__': line like so:

import prolif as plf
import MDAnalysis as mda
# and any other imports you need...

if __name__ == "__main__":
    # rest of your code goes here

Without the if __name__ == "__main__" line it will create circular imports when the prolif parallel code spawns new processes, resulting in the error you see here.

Not directly related but is there a reason why you're using NoImplicit=False when reading the protein? This will disable the bond order and formal charge inferring code on the MDAnalysis side, which means most of the interactions that require proper bond order or charge assignment (HBond, Pi stacking...etc.) will not be detected. This might defeat the purpose of the interaction fingerprint analysis. I'd suggest protonating your protein structures beforehand using something like PypKa or PropKa (also available as command line tools). This way you can keep the default NoImplicit=True.

HannibaltheArrow commented 1 year ago

@cbouy

Thank you for your advice. I implemented the changes you reccomended and managed compute the interaction fingerprint as required. I had NoImplicit=False as the protein structures I am working with have already been pre-processed in the appropriate way.

I am now currently trying to visualise the interaction fingerprint when I get the follwoing error message;

Exception has occurred: ArgumentError Python argument types in rdkit.Chem.rdmolops.SanitizeMol(mol2_supplier, SanitizeFlags) did not match C++ signature: SanitizeMol(RDKit::ROMol {lvalue} mol, unsigned long long sanitizeOps=rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_ALL, bool catchErrors=False) File "/Users/KalenJosifovski/Omega_Dock/main.py", line 108, in net = LigNetwork.from_ifp(df, lig, kind="aggregate", threshold=0.3, rotation=270) Boost.Python.ArgumentError: Python argument types in rdkit.Chem.rdmolops.SanitizeMol(mol2_supplier, SanitizeFlags) did not match C++ signature: SanitizeMol(RDKit::ROMol {lvalue} mol, unsigned long long sanitizeOps=rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_ALL, bool catchErrors=False)

The code I am running is as follows;

def interaction_fp(): prot = mda.Universe('1W6O_prot.mol2')

# replace aromatic bonds with single bonds
for i, bond_order in enumerate(prot._topology.bonds.order):
    # you may need to replace double bonds ("2") as well
    if bond_order == "ar":
        prot._topology.bonds.order[i] = 1
# clear the bond cache, just in case
prot._topology.bonds._cache.pop("bd", None)
# infer bond orders again

prot = plf.Molecule.from_mda(prot)
prot.n_residues
lig = plf.mol2_supplier('LAT_lig.mol2')
fp = plf.Fingerprint()
fp.run_from_iterable(lig, prot)
df = fp.to_dataframe(return_atoms=True)
fp.to_pickle("1W6O-LAT_fingerprint.pkl")
return df, lig

if name == "main": df, lig = interaction_fp() print (df) net = LigNetwork.from_ifp(df, lig, kind="aggregate", threshold=0.3, rotation=270) net.display()

After some investigation it appears that the object passed onto the .from_ipf() class method is of type 'prolif.molecule.mol2_supplier'; is this the cause of the issue? Can you not visualise using docking poses from a mol2 file?

cbouy commented 1 year ago

Hi again,

plf.mol2_supplier returns an iterable of molecules but LigNetwork.from_ifp expects a single molecule, so you can just do

net = LigNetwork.from_ifp(df, lig[0], kind="aggregate", threshold=0.3)

if you have multiple poses in your mol2 file and want to aggregate the interactions.

If you want to visualize a specific pose:

index = 0  # index of the pose in your file
net = LigNetwork.from_ifp(df, lig[index], kind="frame", frame=index)
HannibaltheArrow commented 1 year ago

@cbouy

Huh, I implemented your addition of specifying the index with lig[0] and instead got another error. This time;

Traceback (most recent call last): File "/Users/KalenJosifovski/Omega_Dock/main.py", line 108, in net = LigNetwork.from_ifp(df, lig[0], kind="aggregate", threshold=0.3, rotation=270) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/site-packages/prolif/plotting/network.py", line 319, in from_ifp return cls(data, lig, **kwargs) File "/Users/KalenJosifovski/anaconda3/envs/CondaTorchDrug2/lib/python3.9/site-packages/prolif/plotting/network.py", line 214, in init rdDepictor.GenerateDepictionMatching3DStructure(mol, lig_mol) IndexError: map::at: key not found

The file I want to specify the interactions for is not exactly a docking file with multiple conformations but the ligand file from the PDB crystal structure.

cbouy commented 1 year ago

Mmh that's odd, what does print(lig[0], lig[0].GetNumAtoms()) show?

You can also try adding match3D=False in LigNetwork.from_ifp and it will generate 2D coordinates for the depiction of the ligand from scratch instead of trying to create ones that match the 3D space, but it looks like there's something else going on here and I'm not sure what yet

HannibaltheArrow commented 1 year ago

print(lig[0], lig[0].GetNumAtoms()) <prolif.molecule.Molecule with 1 residues and 51 atoms at 0x156baa9f0> 51

Ok let me see what specifying match3D=False in LigNetwork.from_ifp will output.