CDK-R / cdkr

Integrating R and the CDK
https://cdk-r.github.io/cdkr/
42 stars 27 forks source link

get.exact.mass(): NullPointerException #73

Closed sneumann closed 4 months ago

sneumann commented 5 years ago

Hi, I can confirm @schymane problem with rcdk-3.4.9 (see https://github.com/MassBank/RMassBank/issues/199). I haven't checked in detail yet, but running the example from the get.exact.mass manpage does not work:

       m <- parse.smiles('c1ccccc1')[[1]]

       ## Need to configure the molecule
       do.aromaticity(m)
       do.typing(m)
       do.isotopes(m)

       get.exact.mass(m)

>        get.exact.mass(m)
[1] "Java-Object{java.lang.NullPointerException}"
Error in get.exact.mass(m) : 
  Couldn't get exact mass. Maybe you have not performed aromaticity, atom type or isotope configuration?

So either an issue with the rcdk code, my environment or the documentation.

Yours, Steffen

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rcdk_3.4.9   rcdklibs_2.2 rJava_0.9-10

loaded via a namespace (and not attached):
[1] compiler_3.4.4    tools_3.4.4       parallel_3.4.4    fingerprint_3.5.7
[5] iterators_1.0.10  itertools_0.1-3   png_0.1-7   

and

java --version
openjdk 10.0.1 2018-04-17
OpenJDK Runtime Environment (build 10.0.1+10-Ubuntu-3ubuntu1)
OpenJDK 64-Bit Server VM (build 10.0.1+10-Ubuntu-3ubuntu1, mixed mode)
schymane commented 5 years ago

@sneumann can you confirm this? If same for you, I think this is a historic documentation error as @meowcat handled this differently in RMassBank. https://github.com/MassBank/RMassBank/blob/master/R/leCsvAccess.R#L315 Then to solve this we need to fix docs. This was still on my old and working rcdk3.4.3 (about to test on new one)

#this DOESN'T WORK
smiles <- c("c1ccccc1", "CCC")
m <- parse.smiles(smiles[1])[[1]]
do.aromaticity(m)
do.typing(m)
do.isotopes(m)
convert.implicit.to.explicit(m)
get.mol2formula(m)
get.exact.mass(m) #<= THIS FAILS - and ONLY THIS!
get.natural.mass(m)

m <- parse.smiles(smiles[2])[[1]]
do.aromaticity(m)
do.typing(m)
do.isotopes(m)
convert.implicit.to.explicit(m)
get.mol2formula(m)
get.exact.mass(m) #<= THIS ALSO FAILS! With non-aromatic SMILES
get.natural.mass(m)
#This now works
m <- parse.smiles(smiles[2])[[1]]
do.aromaticity(m)
convert.implicit.to.explicit(m)
do.aromaticity(m) #<= NOTE THE ORDER! THE FUNCTION STILL RETURNS FALSE BUT ... 
do.typing(m)
do.isotopes(m)
get.mol2formula(m)
get.exact.mass(m) #<= THIS NOW WORKS!
get.natural.mass(m)
#This worked all along:
library(RMassBank)
mol <- getMolecule(smiles[1])
get.exact.mass(mol)
schymane commented 5 years ago

with latest rcdk this order dependency is gone. other attached packages: [1] rcdk_3.4.9.1 rcdklibs_2.2 rJava_0.9-8 Rcpp_0.12.17

if I run the "broken" code from above:

smiles <- c("c1ccccc1", "CCC")
m <- parse.smiles(smiles[1])[[1]]
do.aromaticity(m)
do.typing(m)
do.isotopes(m)
convert.implicit.to.explicit(m)
get.mol2formula(m)
get.exact.mass(m) #<= THIS FAILS - and ONLY THIS!
get.natural.mass(m)

m <- parse.smiles(smiles[2])[[1]]
do.aromaticity(m)
do.typing(m)
do.isotopes(m)
convert.implicit.to.explicit(m)
get.mol2formula(m)
get.exact.mass(m) #<= THIS ALSO FAILS! With non-aromatic SMILES
get.natural.mass(m)

then the functions that failed above now return correct values.

> mol <- getMolecule(smiles[1])
> get.exact.mass(mol)
[1] 78.04695
> smiles <- c("c1ccccc1", "CCC")
> m <- parse.smiles(smiles[1])[[1]]
> do.aromaticity(m)
[1] TRUE
> do.typing(m)
> do.isotopes(m)
> convert.implicit.to.explicit(m)
> get.mol2formula(m)
cdkFormula:  C6H6 , mass =  78.04695 , charge =  0 
> get.exact.mass(m) #<= THIS FAILS - and ONLY THIS!
[1] 78.04695
> get.natural.mass(m)
[1] 78.11206
> 
> 
> m <- parse.smiles(smiles[2])[[1]]
> do.aromaticity(m)
[1] FALSE
> do.typing(m)
> do.isotopes(m)
> convert.implicit.to.explicit(m)
> get.mol2formula(m)
cdkFormula:  C3H8 , mass =  44.0626 , charge =  0 
> get.exact.mass(m) #<= THIS ALSO FAILS! With non-aromatic SMILES
[1] 44.0626
> get.natural.mass(m)
[1] 44.09573

@sneumann can you confirm? Seems that would solve this issue for now?

schymane commented 5 years ago

For the record from John:

MolecularFormulaManipulator getTotalExactMass(mf); => for each isotope adds up all the exact mass, if an isotope is not specified (null) the major isotope will be used getNaturalExactMass(mf); => for each isotope adds up the natural mass, ignoring any specified isotopes. getTotalMassNumber(mf); => adds up the major isotope mass number of each isotope element, ignoring any specified isotope getMajorIsotopeMass(mf); => adds up the major isotope mass of each isotope element, ignoring any specified isotope

AtomContainerManipulator getTotalExactMass(IAtomContainer); => for each atom adds up all the exact mass, if isotopes are not specified you'll get a NPE. ImplH count is multiplied with 1H mass getNaturalExactMass(IAtomContainer); => for each atom adds up the natural mass, ignoring any specified isotopes. ImplH count is multiplied with H natural mass. getMolecularWeight(IAtomContainer); => for each atom adds the exact mass if specified, or the natural mass if unspecified (null). ImplH count is multiplied with H natural mass.

schymane commented 5 years ago

Adding bits & pieces from email conversations here for the record while we look into this:

Natural Mass are from BODR - https://github.com/cdk/cdk-build-util/blob/master/src/main/resources/net/sf/cdk/tools/bodr/chemicalElements.xml and BODR=Blue Obelisk Data Repository - https://sourceforge.net/projects/bodr/ and natural mass == average mass == natural abundance? [to check with Chris/Egon] Data from IUPAC 2009: https://github.com/cdk/cdk/blob/cdk-1.4.x/src/main/org/openscience/cdk/config/data/isotopes.xml

and think about: get.mass(mol, type = c('total.exact', 'natural.exact', 'mol.weight')) rather than 3 separate methods. You could even combine this with the two methods that take a MF as the first argument

schymane commented 5 years ago

Further bits:

> m <- parse.smiles("CCNC")[[1]]
> do.aromaticity(m)
[1] FALSE
> do.typing(m)
> do.isotopes(m)
> 
> get.exact.mass(m)
[1] 59.0735
> m <- parse.smiles("CCNC")[[1]]
> do.aromaticity(m)
[1] FALSE
> do.typing(m)
> do.isotopes(m)
> convert.implicit.to.explicit(m)
> get.exact.mass(m)
[1] "Java-Object{java.lang.NullPointerException}"
Error in get.exact.mass(m) : 
  Couldn't get exact mass. Maybe you have not performed aromaticity, atom type or isotope configuration?
> m <- parse.smiles("CCNC")[[1]]
> get.exact.mass(m)
[1] "Java-Object{java.lang.NullPointerException}"
Error in get.exact.mass(m) : 
  Couldn't get exact mass. Maybe you have not performed aromaticity, atom type or isotope configuration?

other attached packages:
[1] RUnit_0.4.31    rcdk_3.4.3      rcdklibs_2.0    rJava_0.9-10    devtools_1.13.3 RMassBank_2.9.1 Rcpp_0.12.19