asapdiscovery / asapdiscovery

Toolkit for open antiviral drug discovery by the ASAP Discovery Consortium
https://asapdiscovery.org
MIT License
30 stars 1 forks source link

`asap-docking` Type checking in `set_SD_data` raises an error because we aren't checking that Complexes have ligands in them #990

Open apayne97 opened 5 months ago

apayne97 commented 5 months ago

this is a fun one, here's the error:

Traceback (most recent call last):
  File "/home/paynea/miniforge3/envs/asapdiscovery/bin/asap-docking", line 8, in <module>
    sys.exit(docking())
  File "/lila/home/paynea/miniforge3/envs/asapdiscovery/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/lila/home/paynea/miniforge3/envs/asapdiscovery/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/lila/home/paynea/miniforge3/envs/asapdiscovery/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/lila/home/paynea/miniforge3/envs/asapdiscovery/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/lila/home/paynea/miniforge3/envs/asapdiscovery/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/lila/data/chodera/paynea/asapdiscovery/asapdiscovery-workflows/asapdiscovery/workflows/docking_workflows/cli.py", line 269, in cross_docking
    cross_docking_workflow(inputs)
  File "/lila/data/chodera/paynea/asapdiscovery/asapdiscovery-workflows/asapdiscovery/workflows/docking_workflows/cross_docking.py", line 149, in cross_docking_workflow
    prepped_complexes = prepper.prep(
  File "/lila/data/chodera/paynea/asapdiscovery/asapdiscovery-modeling/asapdiscovery/modeling/protein_prep.py", line 127, in prep
    inputs, cached_outputs = ProteinPrepperBase._gather_new_tasks(
  File "/lila/data/chodera/paynea/asapdiscovery/asapdiscovery-modeling/asapdiscovery/modeling/protein_prep.py", line 76, in _gather_new_tasks
    cached_outputs = [
  File "/lila/data/chodera/paynea/asapdiscovery/asapdiscovery-modeling/asapdiscovery/modeling/protein_prep.py", line 79, in <listcomp>
    if inp.hash in cached_by_hash
  File "/lila/data/chodera/paynea/asapdiscovery/asapdiscovery-data/asapdiscovery/data/schema/complex.py", line 101, in hash
    return f"{self.target.hash}+{self.ligand.fixed_inchikey}"
  File "/lila/data/chodera/paynea/asapdiscovery/asapdiscovery-data/asapdiscovery/data/schema/ligand.py", line 380, in fixed_inchikey
    mol = self.to_oemol()
  File "/lila/data/chodera/paynea/asapdiscovery/asapdiscovery-data/asapdiscovery/data/schema/ligand.py", line 260, in to_oemol
    mol = set_SD_data(mol, data)
  File "/lila/data/chodera/paynea/asapdiscovery/asapdiscovery-data/asapdiscovery/data/backend/openeye.py", line 856, in set_SD_data
    raise TypeError(
TypeError: Expected an OpenEye OEMol, OEGraphMol, or OEConf, but got <class 'NoneType'>

So a change I made in #692 to add type checking for set_SD_data: https://github.com/choderalab/asapdiscovery/blob/9f73bb2d99b60176a7b7ff754aa08fc78bdb40e9/asapdiscovery-data/asapdiscovery/data/backend/openeye.py#L855-L858

Is making a separate problem appear, which is that when we load in complexes and pass them to the prep object, we never actually confirm that they have ligands: https://github.com/choderalab/asapdiscovery/blob/9f73bb2d99b60176a7b7ff754aa08fc78bdb40e9/asapdiscovery-workflows/asapdiscovery/workflows/docking_workflows/cross_docking.py#L130-L156

So when we try to get their hash in here: https://github.com/choderalab/asapdiscovery/blob/9f73bb2d99b60176a7b7ff754aa08fc78bdb40e9/asapdiscovery-modeling/asapdiscovery/modeling/protein_prep.py#L76-L80

There's not actually a ligand there

Question

apayne97 commented 5 months ago

I made a gist to demonstrate where the problem arises - in #692 by changing the sdf_string_to_oemol function, I (accidentally) made it so that to_oemol() returns None instead of returning an OEGraphMol if the ligand is empty.

If we like returning an empty OEMol I can make Ligand.to_oemol() or sdf_string_to_oemol do that.

Incidentally, this is the cause of all these warnings: Warning: OECreateInChI: InChI only supports molecules with between 1 and 1023 atoms! (note: large molecule support is experimental)

https://gist.github.com/apayne97/8d2c8a13f6045ad5da3bea13d04aa262

apayne97 commented 5 months ago

@hmacdope made a quickfix to have sdf_string_to_oemol return an empty OEMol in #1001