CDK-R / cdkr

Integrating R and the CDK
https://cdk-r.github.io/cdkr/
42 stars 27 forks source link

Problems with isotope pattern of charged molecules #38

Closed michaelwitting closed 7 months ago

michaelwitting commented 7 years ago

Dear rcdk-Developers,

I'm using rcdk for prediction of isotope patterns for mass spectrometric analysis. I recognized that some problems when working with charged formulas. The function get.isotopes.pattern returns the masses without correction for charge.

Please find a example for the [M+Na]+ adduct of Glucose below.

Best regards,

Michael

library(rcdk)

glucose <- "C6H12O6" glucoseFormula <- get.formula(glucose, charge = 0)

sodium <- "Na" sodiumFormula <- get.formula(sodium, charge = 1)

glucoseFormula@mass + sodiumFormula@mass

glucoseSodium <- "C6H12O6Na" glucoseSodiumFormula <- get.formula(glucoseSodium, charge = 1)

glucoseSodiumFormula@mass

get.isotopes.pattern(glucoseSodiumFormula, minAbund = 0.001)[1] get.formula(glucoseSodium, charge = 0)@mass

The results are:

glucoseFormula@mass + sodiumFormula@mass [1] 203.0526

glucoseSodiumFormula@mass [1] 203.0526

get.isotopes.pattern(glucoseSodiumFormula, minAbund = 0.001)[1] [1] 203.0532

get.formula(glucoseSodium, charge = 0)@mass [1] 203.0532

rajarshi commented 7 years ago

This looks like a bug in the CDK. Once that gets fixed, this should be resolved. But unfortunately, I don't have a timeline

johnmay commented 7 years ago

@michaelwitting formulas don't work like that... try generating the MF from SMILES, the adduct is C6H11O6- not C6H12O6.

OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O MF=C6H12O6
[Na+] MF=[Na]+
OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1[O-] MF=C6H11O6
OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1[O-].[Na+] MF=C6H11O6Na

Need more?, check out PubChem phenolate is C6H5O- not C6H6O-:

https://pubchem.ncbi.nlm.nih.gov/compound/phenolate#section=Top

johnmay commented 7 years ago

It's a misconception you can "correct" for charge:

image c1cccc[n+]1C

johnmay commented 7 years ago

After some thought... I realised it's reasonable to have a (de)protonate method on formulas. However this will likely be something you should do after the initialization.

glucose <- get.formula("C6H12O6")
glucose.adjustProtonation(-1) // e.g.
// C6H11O6

The charge param is redundant BTW @rajarshi, the parser handles the charge just need square brackets:

hcl <- get.formula("HCl")
chloride <- get.formula("[Cl]-")

... actually makes it clearer what was wrong, I think that's a moderately new feature though so explains why the charge is as a parameter here.

michaelwitting commented 7 years ago

Hi there... Sorry for the late answer.

What would like to be able to use CDK/RCDK for is to predict the isotopic pattern of adducts in mass spectrometry, e.g. [M+H]+ adducts. This would not only include protonation and deprotonation but also sodiation [M+Na]+ and chlorine adducts in neg mode [M+Cl]-.

To stay with the glucose example, I would have the formula [C6H12O6Na]+ of Glucose detected as [M+Na]+ adduct and from this I would like to have the isotopic pattern for the sodiated molecule.

Might this be possible?

johnmay commented 7 years ago

Hmm, I don't quite understand but shouldn't [C6H12O6Na]+ and C6H12O6Na have the same mass?

michaelwitting commented 7 years ago

No, because in [C6H12O6Na]+ there is a Na+ attached, in C6H12O6Na a Na. The mass of a electron is the different between the two.

johnmay commented 7 years ago

That does make sense... so the other way round C6H12O5 and [C6H12O5]- you want the negatively charge one to weight one electron more. This is in theory possible but the simplest fix here sounds like you can just add/remove the e- weight yourself right?

johnmay commented 7 years ago

Okay.. now I see the actual problem... urgh the IsotopePattern class is a pain to patch (has been abandoned in the code base).

michaelwitting commented 7 years ago

Yes. C6H12O5- is C6H12O6 - H+, so basically it is C6H12O5 + e-.