Closed likun1212 closed 2 years ago
Could you include a stack trace of an example of this? I think the proper response should be to return None
for the molecule, what about you?
i did all the test in a ipython.
1) md = ps.build_metadata("vina",dict(exhaustiveness=8))
2) vs = ps.virtual_screen("vina", ["4idv_glide_xp_lifechem.pdb"], (16.8,13.8,87.3), (25,25,25), md, ncpu=8)
3) s = vs('CCCCooC') # here I feed an invlid smiles
RayTaskError(ArgumentError) Traceback (most recent call last) Input In [12], in <cell line: 1>() ----> 1 s = vs('CCCCooC')
File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/screen.py:166, in DockingVirtualScreen.call(self, smiles, sources) 163 sources = list(chain(([s] if isinstance(s, str) else s for s in sources))) 165 planned_simulationsss = self.plan(sources, smiles) --> 166 completed_simulationsss = self.run(planned_simulationsss) 168 self.completed_simulationsss.extend(completed_simulationsss) 169 S = np.array( 170 [ 171 [[s.result.score for s in sims] for sims in simss] (...) 174 dtype=float, 175 )
File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/screen.py:275, in DockingVirtualScreen.run(self, planned_simulationsss) 268 def run( 269 self, planned_simulationsss: List[List[List[CalculationData]]] 270 ) -> List[List[List[CalculationData]]]: 271 refsss = [ 272 [[self.prepare_and_run.remote(s) for s in sims] for sims in simss] 273 for simss in planned_simulationsss 274 ] --> 275 return [ 276 [ray.get(refs) for refs in refss] 277 for refss in tqdm(refsss, desc="Docking", unit="ligand", smoothing=0.0) 278 ]
File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/screen.py:276, in
File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/screen.py:276, in
File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/ray/_private/client_mode_hook.py:105, in client_mode_hook.
File ~/anaconda3/envs/molpal/lib/python3.8/site-packages/ray/worker.py:1733, in get(object_refs, timeout) 1731 worker.core_worker.dump_object_store_memory_usage() 1732 if isinstance(value, RayTaskError): -> 1733 raise value.as_instanceof_cause() 1734 else: 1735 raise value
RayTaskError(ArgumentError): ray::prepare_and_run() (pid=1036778, ip=192.168.2.25) File "/home/yanglikun/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/vina/runner.py", line 67, in prepare_and_run VinaRunner.prepare_ligand(data) File "/home/yanglikun/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/vina/runner.py", line 75, in prepare_ligand VinaRunner.prepare_from_smi(data) File "/home/yanglikun/anaconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/vina/runner.py", line 95, in prepare_from_smi mol = Chem.AddHs(Chem.MolFromSmiles(data.smi)) Boost.Python.ArgumentError: Python argument types in rdkit.Chem.rdmolops.AddHs(NoneType) did not match C++ signature: AddHs(RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False, boost::python::api::object onlyOnAtoms=None, bool addResidueInfo=False)
in this case, the input is an invalid molecule. I could add some code to catch invalid molecules and return nan
values for these simulations, but I don't that's logically correct. A nan
logically means that the simulation failed for some uncontrollable reason. Here, the simulation failed because the input wasn't a valid molecule. I'm of the mindset that we shouldn't be silently suppressing errors like that in pyscreener
. I'm of the opinion that it's the responsibility of the client code to ensure that all molecules passed to pyscreener
are valid.
The way I see it, we have two chocies:
pyscreener
to crash, so 1 bad apple will ruin the (potentially massive) bunch. This seems suboptimalnan
for their simulation values. In this case, it's impossible to differentiate in the resulting scores array whether a nan
value is due to random simulation failure (meaning you can just resubmit and potentially fix it) or an invalid molecule (which will never work no matter how many times you try)there are always some bad apples in a large library, i don't think crash is a good way to handle that error("ruin the bunch").
at least some breakpoint protection mechanisms should be introduced.
By the same token, why is it the responsibility of pyscreener
to ensure valid SMILES strings are passed? If you pass in an invalid array to numpy
functions, it doesn’t silently handle the error and return None
. Instead, it raises some form of a ValueError
. I get that catching these bad molecules could be convenient for your use-case, but I want to maintain the integrity of what a nan
means in the context of the returned scores array.
i am using ps to do vs, (scores = vs(virtual_lib)),
however, when vs meets some molecular that can't be processed correctly (e.g., rdkit can't generate conformer), the vs process will crash rather than pass/handle those molecular.
I am wandering is there any error handle procedure in ps?
thanks