coleygroup / pyscreener

pythonic interface to virtual screening software
MIT License
85 stars 32 forks source link

handle errors during virtual screen? #35

Closed likun1212 closed 2 years ago

likun1212 commented 2 years ago

i am using ps to do vs, (scores = vs(virtual_lib)),

however, when vs meets some molecular that can't be processed correctly (e.g., rdkit can't generate conformer), the vs process will crash rather than pass/handle those molecular.

I am wandering is there any error handle procedure in ps?

thanks

davidegraff commented 2 years ago

Could you include a stack trace of an example of this? I think the proper response should be to return None for the molecule, what about you?

likun1212 commented 2 years ago

i did all the test in a ipython.

1) md = ps.build_metadata("vina",dict(exhaustiveness=8))

2) vs = ps.virtual_screen("vina", ["4idv_glide_xp_lifechem.pdb"], (16.8,13.8,87.3), (25,25,25), md, ncpu=8)

3) s = vs('CCCCooC') # here I feed an invlid smiles

below are the emg:

Docking: 0%| | 0/1 [00:00<?, ?ligand/s](prepare_and_run pid=1036778) [20:19:41] non-ring atom 4 marked aromatic

RayTaskError(ArgumentError) Traceback (most recent call last) Input In [12], in <cell line: 1>() ----> 1 s = vs('CCCCooC')

File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/screen.py:166, in DockingVirtualScreen.call(self, smiles, sources) 163 sources = list(chain(([s] if isinstance(s, str) else s for s in sources))) 165 planned_simulationsss = self.plan(sources, smiles) --> 166 completed_simulationsss = self.run(planned_simulationsss) 168 self.completed_simulationsss.extend(completed_simulationsss) 169 S = np.array( 170 [ 171 [[s.result.score for s in sims] for sims in simss] (...) 174 dtype=float, 175 )

File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/screen.py:275, in DockingVirtualScreen.run(self, planned_simulationsss) 268 def run( 269 self, planned_simulationsss: List[List[List[CalculationData]]] 270 ) -> List[List[List[CalculationData]]]: 271 refsss = [ 272 [[self.prepare_and_run.remote(s) for s in sims] for sims in simss] 273 for simss in planned_simulationsss 274 ] --> 275 return [ 276 [ray.get(refs) for refs in refss] 277 for refss in tqdm(refsss, desc="Docking", unit="ligand", smoothing=0.0) 278 ]

File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/screen.py:276, in (.0) 268 def run( 269 self, planned_simulationsss: List[List[List[CalculationData]]] 270 ) -> List[List[List[CalculationData]]]: 271 refsss = [ 272 [[self.prepare_and_run.remote(s) for s in sims] for sims in simss] 273 for simss in planned_simulationsss 274 ] 275 return [ --> 276 [ray.get(refs) for refs in refss] 277 for refss in tqdm(refsss, desc="Docking", unit="ligand", smoothing=0.0) 278 ]

File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/screen.py:276, in (.0) 268 def run( 269 self, planned_simulationsss: List[List[List[CalculationData]]] 270 ) -> List[List[List[CalculationData]]]: 271 refsss = [ 272 [[self.prepare_and_run.remote(s) for s in sims] for sims in simss] 273 for simss in planned_simulationsss 274 ] 275 return [ --> 276 [ray.get(refs) for refs in refss] 277 for refss in tqdm(refsss, desc="Docking", unit="ligand", smoothing=0.0) 278 ]

File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/ray/_private/client_mode_hook.py:105, in client_mode_hook..wrapper(*args, kwargs) 103 if func.name != "init" or is_client_mode_enabled_by_default: 104 return getattr(ray, func.name)(*args, *kwargs) --> 105 return func(args, kwargs)

File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/ray/worker.py:1733, in get(object_refs, timeout) 1731 worker.core_worker.dump_object_store_memory_usage() 1732 if isinstance(value, RayTaskError): -> 1733 raise value.as_instanceof_cause() 1734 else: 1735 raise value

RayTaskError(ArgumentError): ray::prepare_and_run() (pid=1036778, ip=192.168.2.25) File "/home/yanglikun/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/vina/runner.py", line 67, in prepare_and_run VinaRunner.prepare_ligand(data) File "/home/yanglikun/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/vina/runner.py", line 75, in prepare_ligand VinaRunner.prepare_from_smi(data) File "/home/yanglikun/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/vina/runner.py", line 95, in prepare_from_smi mol = Chem.AddHs(Chem.MolFromSmiles(data.smi)) Boost.Python.ArgumentError: Python argument types in rdkit.Chem.rdmolops.AddHs(NoneType) did not match C++ signature: AddHs(RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False, boost::python::api::object onlyOnAtoms=None, bool addResidueInfo=False)

davidegraff commented 2 years ago

in this case, the input is an invalid molecule. I could add some code to catch invalid molecules and return nan values for these simulations, but I don't that's logically correct. A nan logically means that the simulation failed for some uncontrollable reason. Here, the simulation failed because the input wasn't a valid molecule. I'm of the mindset that we shouldn't be silently suppressing errors like that in pyscreener. I'm of the opinion that it's the responsibility of the client code to ensure that all molecules passed to pyscreener are valid.

The way I see it, we have two chocies:

  1. do nothing and keep things as is. Invalid molecules will cause pyscreener to crash, so 1 bad apple will ruin the (potentially massive) bunch. This seems suboptimal
  2. silently handle invalid molecules and report nan for their simulation values. In this case, it's impossible to differentiate in the resulting scores array whether a nan value is due to random simulation failure (meaning you can just resubmit and potentially fix it) or an invalid molecule (which will never work no matter how many times you try)
likun1212 commented 2 years ago

there are always some bad apples in a large library, i don't think crash is a good way to handle that error("ruin the bunch").

at least some breakpoint protection mechanisms should be introduced.

davidegraff commented 2 years ago

By the same token, why is it the responsibility of pyscreener to ensure valid SMILES strings are passed? If you pass in an invalid array to numpy functions, it doesn’t silently handle the error and return None. Instead, it raises some form of a ValueError. I get that catching these bad molecules could be convenient for your use-case, but I want to maintain the integrity of what a nan means in the context of the returned scores array.