Closed ljmartin closed 1 year ago
@ljmartin: Thanks for raising this issue!
EspalomaCharge should only be using atom features that are invariant for resonance forms, so it should handle the carboxylic acid case here and assign equivalent charges for the oxygens.
What I suspect is happening here is that we never anticipated EspalomaCharge receiving molecules with implicit hydrogens, and the molecule create with
m = Chem.MolFromSmiles('OC(c1ccccc1)=O')
and pass in to
ch = charge(m, total_charge=-1)
is missing key hydrogens needed to actually produce useful charges.
Could you try this again by (1) specifying the carboxylate SMILES, and (2) telling RDKit to explicitly protonate the molecule before you pass it to charge()
? You should not need to specify the total_charge
.
@yuanqing-wang : It looks like we have two dangerous issues here we should address:
total_charge
since the explicit hydrogen form of the molecule from RDKit should already provide all relevant information about the net charge. Allowing someone to specify a non-matching total charge will just cause bad charges to be produced.Thanks @jchodera!
It makes good sense that explicit hydrogens are required. Here's the same code but with the charge specified and explicit hydrogens added, i.e.:
m = Chem.MolFromSmiles('[O-]C(c1ccccc1)=O')
m = Chem.AddHs(m)
The result is definitely much closer - though not exactly the same, which one might expect if they were considered equivalent:
One thing to note is that the charges don't sum to -1. The charges are the same whether or not I supply total_charge=-1
):
ch.sum()
output: 1.1920929e-07
I couldn't find a __version__
method, but I installed it yesterday from pip if that means anything!
Edit: looking at EEM Charges and XTB charges, neither gives equivalent charges on the oxygens. Perhaps this is normal?
Thanks for trying this!
OK, now this is puzzling for two reasons:
@yuanqing-wang : Can you investigate?
Hi @ljmartin! It seems that, when Chem.AddHs
, RDKit would by default add all hydrogens to COO-, making it a COOH. I'm thinking about what's the best way to circumvent this.
Thanks @yuanqing-wang - as far as I can tell, though, hydrogens are not added to the carboxylate. The picture above was created after adding hydrogens. One can verify that hydrogens aren't added to the carboxylate by saving a PDB structure and visualizing, or just counting up the atoms:
m = Chem.MolFromSmiles('[O-]C(c1ccccc1)=O')
m = Chem.AddHs(m)
print(m.GetNumAtoms())
output: 14
. That's 9 heavy atoms (7 carbons, 2 oxygens) and 5 hydrogens (around the benzene ring).
RDKit version is 2022.09.4
Sorry that was my mistake.
It seems that there was a keyword mismatch in the deployment code. I'm drafting a fix here: https://github.com/choderalab/espaloma_charge/pull/17
I'll test and pull in a minute.
@jchodera @ljmartin it also seems that atom.GetTotalValence()
for the oxygen atoms would show 2 and 1, respectively. Maybe we should get rid of this in the input feature as well.
@yuanqing-wang : Yes, we need to replace that with atom.GetDegree()
.
thanks for looking into this, @yuanqing-wang
@jchodera - could I trouble you to point me to the code in espaloma or espaloma_charge that handles resonance forms? It's something I've been looking to do, too.
@ljmartin : There is no special code in espaloma_charge
or espaloma
to handle resonance forms. In espaloma_charge
(and in future updates of espaloma
) our plan is to only use atom (and possibly bond) features that are invariant with resonance forms, but as noted above, we accidentally used atom.GetValence()
(which is not resonance form invariant) instead of atom.GetDegree()
. We'll fix that in a retraining and update in the next few days.
For more sophisticated approaches to handling resonance---such as averaging over localized resonance forms (including of biopolymers), you might check out the https://github.com/openforcefield/openff-nagl package, which has a much more sophisticated way to tackle this.
Amazing, thanks @jchodera. Looks like this is wrapped up, so I'll close it.
Thanks again for raising this issue and helping us track down what went wrong!
Hi all, thank you for making this package available!
I tested this out on some molecules and found that, for what we might think of as equivalent atoms, partial charges are different.
Example (see charges on oxygens):
This may be expected: Section 8.5 of the manuscript says
Ensuring full chemical equivalence is nontrivial
:I see why it's hard - SMILES requires a kekule form rather than a resonance form. But is it possible to get around this? One might simply average the carboxylate charges, for instance, but I wonder if there's a more principled way?
Thanks Lewis