Closed zpincus closed 1 year ago
Thank you. I applied your fix in v2022.12.9.
parse charges of the form [CHNOP]+2
Is that format widely used? I think for now the IUPAC order is good enough.
Is that format widely used? I think for now the IUPAC order is good enough.
Hmm, rdkit.Chem.rdMolDescriptors.CalcMolFormula
produces CHNOP+2
which is a horror but there you go. Not sure that I've actually seen the bracketed form in the wild now that you mention it. (I forgot that rdkit made the un-bracketed version.)
rdkit.Chem.rdMolDescriptors.CalcMolFormula produces CHNOP+2
Molmass v2023.4.10 supports rdkit-style ionic charges.
>>> Formula('CHNOP+2').charge
2
Actual behavior:
Expected behavior:
Problem:
molmass.py:1813
:where the first
{1,}
allows one or more of]
or_
to act as the delimiter. Changing this line to:resolves this issue (at the cost of disallowing multiple underscores to act as a delimiter; if this is desired, the regexp could of course be expanded).
molmass --test
andmolmass --doctest
both pass with this change.Another related issue is the inability to parse charges of the form
[CHNOP]+2
instead of[CHNOP]2+
, which could also be remedied (if desired) by adding a second regexp similar to above, a la:This latter change also does not cause any current tests to fail, but perhaps there are other dark corner-cases...
Version is the current release, btw: