choderalab / espaloma_charge

Standalone charge assignment from Espaloma framework.
MIT License
35 stars 5 forks source link

Is there any way to generate equivalent charges for resonance hybrids like carboxylate? #16

Closed ljmartin closed 1 year ago

ljmartin commented 1 year ago

Hi all, thank you for making this package available!

I tested this out on some molecules and found that, for what we might think of as equivalent atoms, partial charges are different.

Example (see charges on oxygens):

from rdkit import Chem; from espaloma_charge import charge
from rdkit.Chem.Draw import IPythonConsole, rdMolDraw2D
import numpy as np
from IPython.display import SVG

m = Chem.MolFromSmiles('OC(c1ccccc1)=O')
ch = charge(m, total_charge=-1)

for i in range(m.GetNumAtoms()):
    m.GetAtomWithIdx(i).SetProp("atomNote",str(np.around(ch[i],3)))

d2d = rdMolDraw2D.MolDraw2DSVG(350,300)
d2d.DrawMolecule(m)
d2d.FinishDrawing()
SVG(d2d.GetDrawingText())
Screenshot 2023-02-24 at 6 33 54 pm

This may be expected: Section 8.5 of the manuscript says Ensuring full chemical equivalence is nontrivial:

the non-uniqueness of formal charge assignment (obvious in molecules such as guanidinium, where resonance forms locate the formal charge on different atoms) does not guarantee the assigned parameters will respect chemical equivalence (a form of invariance)

I see why it's hard - SMILES requires a kekule form rather than a resonance form. But is it possible to get around this? One might simply average the carboxylate charges, for instance, but I wonder if there's a more principled way?

Thanks Lewis

jchodera commented 1 year ago

@ljmartin: Thanks for raising this issue!

EspalomaCharge should only be using atom features that are invariant for resonance forms, so it should handle the carboxylic acid case here and assign equivalent charges for the oxygens.

What I suspect is happening here is that we never anticipated EspalomaCharge receiving molecules with implicit hydrogens, and the molecule create with

m = Chem.MolFromSmiles('OC(c1ccccc1)=O')

and pass in to

ch = charge(m, total_charge=-1)

is missing key hydrogens needed to actually produce useful charges.

Could you try this again by (1) specifying the carboxylate SMILES, and (2) telling RDKit to explicitly protonate the molecule before you pass it to charge()? You should not need to specify the total_charge.

@yuanqing-wang : It looks like we have two dangerous issues here we should address:

  1. We need to check to make sure the molecules we are charging have all hydrogens explicitly represented, and raise an Exception if not
  2. We should consider eliminating total_charge since the explicit hydrogen form of the molecule from RDKit should already provide all relevant information about the net charge. Allowing someone to specify a non-matching total charge will just cause bad charges to be produced.
ljmartin commented 1 year ago

Thanks @jchodera!

It makes good sense that explicit hydrogens are required. Here's the same code but with the charge specified and explicit hydrogens added, i.e.:

m = Chem.MolFromSmiles('[O-]C(c1ccccc1)=O')
m = Chem.AddHs(m)

The result is definitely much closer - though not exactly the same, which one might expect if they were considered equivalent:

Screenshot 2023-02-25 at 9 21 29 pm

One thing to note is that the charges don't sum to -1. The charges are the same whether or not I supply total_charge=-1):

ch.sum()

output: 1.1920929e-07

I couldn't find a __version__ method, but I installed it yesterday from pip if that means anything!

Edit: looking at EEM Charges and XTB charges, neither gives equivalent charges on the oxygens. Perhaps this is normal?

jchodera commented 1 year ago

Thanks for trying this!

OK, now this is puzzling for two reasons:

  1. I don't understand why the oxygens would be featurized in different ways and receive different charges---they should be the same with the features we're putting in
  2. I don't understand why the charge sum is not correct

@yuanqing-wang : Can you investigate?

yuanqing-wang commented 1 year ago

Hi @ljmartin! It seems that, when Chem.AddHs, RDKit would by default add all hydrogens to COO-, making it a COOH. I'm thinking about what's the best way to circumvent this.

ljmartin commented 1 year ago

Thanks @yuanqing-wang - as far as I can tell, though, hydrogens are not added to the carboxylate. The picture above was created after adding hydrogens. One can verify that hydrogens aren't added to the carboxylate by saving a PDB structure and visualizing, or just counting up the atoms:

m = Chem.MolFromSmiles('[O-]C(c1ccccc1)=O')
m = Chem.AddHs(m)
print(m.GetNumAtoms())

output: 14. That's 9 heavy atoms (7 carbons, 2 oxygens) and 5 hydrogens (around the benzene ring).

RDKit version is 2022.09.4

yuanqing-wang commented 1 year ago

Sorry that was my mistake.

It seems that there was a keyword mismatch in the deployment code. I'm drafting a fix here: https://github.com/choderalab/espaloma_charge/pull/17

I'll test and pull in a minute.

yuanqing-wang commented 1 year ago

@jchodera @ljmartin it also seems that atom.GetTotalValence() for the oxygen atoms would show 2 and 1, respectively. Maybe we should get rid of this in the input feature as well.

jchodera commented 1 year ago

@yuanqing-wang : Yes, we need to replace that with atom.GetDegree().

ljmartin commented 1 year ago

thanks for looking into this, @yuanqing-wang

@jchodera - could I trouble you to point me to the code in espaloma or espaloma_charge that handles resonance forms? It's something I've been looking to do, too.

jchodera commented 1 year ago

@ljmartin : There is no special code in espaloma_charge or espaloma to handle resonance forms. In espaloma_charge (and in future updates of espaloma) our plan is to only use atom (and possibly bond) features that are invariant with resonance forms, but as noted above, we accidentally used atom.GetValence() (which is not resonance form invariant) instead of atom.GetDegree(). We'll fix that in a retraining and update in the next few days.

For more sophisticated approaches to handling resonance---such as averaging over localized resonance forms (including of biopolymers), you might check out the https://github.com/openforcefield/openff-nagl package, which has a much more sophisticated way to tackle this.

ljmartin commented 1 year ago

Amazing, thanks @jchodera. Looks like this is wrapped up, so I'll close it.

jchodera commented 1 year ago

Thanks again for raising this issue and helping us track down what went wrong!