CDK-R / cdkr

Integrating R and the CDK
https://cdk-r.github.io/cdkr/
42 stars 27 forks source link

Only "0" returned for AromaticAtomsCountDescriptor and AromaticBondsCountDescriptor #147

Open bsedio opened 7 months ago

bsedio commented 7 months ago

Hello rcdk,

I am querying the CDK chemical properties using smiles generated using the CSI:FingerID module of Sirius. For some reason, I get "0" for number of aromatic atoms and bonds for every metabolite in my dataset, which includes many aromatic compounds (flavonoids, alkaloids from plant extracts).

Here is a small example:

dn.arom = c("org.openscience.cdk.qsar.descriptors.molecular.AromaticBondsCountDescriptor", "org.openscience.cdk.qsar.descriptors.molecular.AromaticAtomsCountDescriptor")
mol.aromtest = parse.smiles("C=CC1C2CC3C4=C(CCN3C(=O)C2=COC1OC5C(C(C(C(O5)CO)O)O)O)C6=CC=CC=C6N4")
mol.aromtest = parse.smiles(c("C=CC1C2CC3C4=C(CCN3C(=O)C2=COC1OC5C(C(C(C(O5)CO)O)O)O)C6=CC=CC=C6N4","CC(=NO)C1=CC(=C(C=C1CC2=NC=CC3=CC(=C(C=C32)OC)OC)OC)OC"))
desc.arom.test = eval.desc(mol.aromtest, dn.arom)
head(desc.arom.test)
                                                                    nAromBond naAromAtom
C=CC1C2CC3C4=C(CCN3C(=O)C2=COC1OC5C(C(C(C(O5)CO)O)O)O)C6=CC=CC=C6N4         0          0
CC(=NO)C1=CC(=C(C=C1CC2=NC=CC3=CC(=C(C=C32)OC)OC)OC)OC                      0          0

I am using R version 4.3.1 "Beagle Scouts" on Mac OS 12.6 Monterey and java version "1.8.0_391" Java(TM) SE Runtime Environment (build 1.8.0_391-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.391-b13, mixed mode)

sessionInfo() R version 4.3.1 (2023-06-16) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.6

Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rcdk_3.8.1 rcdklibs_2.8 rJava_1.0-6

loaded via a namespace (and not attached): [1] compiler_4.3.1 parallel_4.3.1 fingerprint_3.5.7 iterators_1.0.14 itertools_0.1-3 png_0.1-8

Thank you very much for any assistance, Brian

zachcp commented 3 months ago

Hi Brian,

Sorry for the slow reply. You have to set/find aromaticity explicitly. I followed the CDK docs to set this explicitly and then Aromatic features are calculated as you would expect.


library(rcdk)

mols <- parse.smiles("C1=CC=CC=C1")
descriptor1 <- .jnew('org.openscience.cdk.qsar.descriptors.molecular.AromaticBondsCountDescriptor')
descriptor1$initialise(get.chem.object.builder())
val <- descriptor1$calculate(mols[[1]])
val$getNames()
val$getValue()
# zero

# set aromaticity explicitly
# follow the CDK text
# http://cdk.github.io/cdk/2.9/docs/api/org/openscience/cdk/aromaticity/Aromaticity.html

electron_donation <- J('org.openscience.cdk.aromaticity.ElectronDonation')
aromaticity <- J('org.openscience.cdk.aromaticity.Aromaticity')
cycles <- J('org.openscience.cdk.graph.Cycles')
model  <- electron_donation$daylight()
cyc    <- cycles$or(cycles$all(), cycles$all(as.integer(6)))

aroma  <- new(aromaticity, model, cyc)
aroma$apply(mols)

descriptor3 <- .jnew('org.openscience.cdk.qsar.descriptors.molecular.AromaticAtomsCountDescriptor')
descriptor3$initialise(get.chem.object.builder())
val3 <- descriptor2$calculate(mols)
val3$getNames()
val3$getValue()
# 6
zachcp commented 3 months ago

So something like this could work:

set_aromatic <- function(molecules) {
  electron_donation <- J('org.openscience.cdk.aromaticity.ElectronDonation')
  aromaticity <- J('org.openscience.cdk.aromaticity.Aromaticity')
  cycles <- J('org.openscience.cdk.graph.Cycles')
  model  <- electron_donation$daylight()
  cyc    <- cycles$or(cycles$all(), cycles$all(as.integer(6)))

  aroma  <- new(aromaticity, model, cyc)

  for (mol in mols) {
    aroma$apply(mols)  
  }

  molecules  

}
zachcp commented 3 months ago

another workaround is using lowercase in SMILES to denote aromaticity:

c1=cc=cc=c1 vs: C1=CC=CC=1

zachcp commented 3 months ago

related: https://github.com/CDK-R/cdkr/issues/111