OpenFreeEnergy / gufe

grand unified free energy by OpenFE
https://gufe.readthedocs.io
MIT License
30 stars 9 forks source link

LigandNetwork unique molecule checks #371

Open jthorton opened 17 hours ago

jthorton commented 17 hours ago

LigandNetwork should enforce that each ligand is unique in the network to avoid issues with transformations between the same ligands which are wasted effort and might cause issues downstream if we have DDG estimates of 0.0 and lead to broken networks.

IAlibay commented 17 hours ago

unique in the network

What kind of uniqueness are we looking for here? I think as long as we're talking about SMC uniqueness (i.e. technically it could be the same molecule but different coordinates / charges / etc..) then I would agree that this would make sense.

DG estimates of 0.0

We do have to remember that LigandNetwork != AlchemicalNetwork. Technically you could consider a case where you have same ligand + different protein as nodes. That being said, LigandNetwork is inherently flawed for that use case - indeed it might not be something we want to keep around in the long run.

broken networks

Just raising here that broken networks are a Protocol specific issue, so I wouldn't spend too much time thinking about them - i.e. it's ok for a network to be broken, it's up to the Protocol to make sure they have the right tooling to handle that case.

jthorton commented 15 hours ago

What kind of uniqueness are we looking for here? I think as long as we're talking about SMC uniqueness (i.e. technically it could be the same molecule but different coordinates / charges / etc..) then I would agree that this would make sense.

So my user story is from ASAP: we get a large list of molecules from the medchem team which would often have repeats in, we can use OpenFE to plan a network and run rbfe calculations but due to the network having transformations between the same ligand we get edges with DDG of 0 which breaks MLE and results in broken networks. We now have to filter the ligands before hand but checking for users might be good generally.

I would say emit a warning if you have the same ligand in the network more than once by chemical identity but different coords/charges.

Raise an error if we have duplicated ligands with the same identity, coords and charges.

We do have to remember that LigandNetwork != AlchemicalNetwork. Technically you could consider a case where you have same ligand + different protein as nodes. That being said, LigandNetwork is inherently flawed for that use case - indeed it might not be something we want to keep around in the long run.

This could be an issue for the error case above, would having a flag to turn on this behaviour be a good idea like check_unique when creating a LigandNetwork? I am sure there will be a lot of other things users want to do with LigandNetwork so I don't want this check to stop them.