Integrating R and the CDK
Only "0" returned for AromaticAtomsCountDescriptor and AromaticBondsCountDescriptor #147

Open bsedio opened 7 months ago

bsedio commented 7 months ago

Hello rcdk,

I am querying the CDK chemical properties using smiles generated using the CSI:FingerID module of Sirius. For some reason, I get "0" for number of aromatic atoms and bonds for every metabolite in my dataset, which includes many aromatic compounds (flavonoids, alkaloids from plant extracts).

Here is a small example:

dn.arom = c("org.openscience.cdk.qsar.descriptors.molecular.AromaticBondsCountDescriptor", "org.openscience.cdk.qsar.descriptors.molecular.AromaticAtomsCountDescriptor")
mol.aromtest = parse.smiles("C=CC1C2CC3C4=C(CCN3C(=O)C2=COC1OC5C(C(C(C(O5)CO)O)O)O)C6=CC=CC=C6N4")
mol.aromtest = parse.smiles(c("C=CC1C2CC3C4=C(CCN3C(=O)C2=COC1OC5C(C(C(C(O5)CO)O)O)O)C6=CC=CC=C6N4","CC(=NO)C1=CC(=C(C=C1CC2=NC=CC3=CC(=C(C=C32)OC)OC)OC)OC"))
desc.arom.test = eval.desc(mol.aromtest, dn.arom)
                                                                    nAromBond naAromAtom
C=CC1C2CC3C4=C(CCN3C(=O)C2=COC1OC5C(C(C(C(O5)CO)O)O)O)C6=CC=CC=C6N4         0          0
CC(=NO)C1=CC(=C(C=C1CC2=NC=CC3=CC(=C(C=C32)OC)OC)OC)OC                      0          0

I am using R version 4.3.1 "Beagle Scouts" on Mac OS 12.6 Monterey and java version "1.8.0_391" Java(TM) SE Runtime Environment (build 1.8.0_391-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.391-b13, mixed mode)

R version 4.3.1 (2023-06-16) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.6

Thank you very much for any assistance, Brian

zachcp commented 3 months ago

Hi Brian,

Sorry for the slow reply. You have to set/find aromaticity explicitly. I followed the CDK docs to set this explicitly and then Aromatic features are calculated as you would expect.


mols <- parse.smiles("C1=CC=CC=C1")
descriptor1 <- .jnew('org.openscience.cdk.qsar.descriptors.molecular.AromaticBondsCountDescriptor')
val <- descriptor1$calculate(mols[[1]])
# zero

# set aromaticity explicitly
# follow the CDK text

electron_donation <- J('org.openscience.cdk.aromaticity.ElectronDonation')
aromaticity <- J('org.openscience.cdk.aromaticity.Aromaticity')
cycles <- J('org.openscience.cdk.graph.Cycles')
model  <- electron_donation$daylight()
cyc    <- cycles$or(cycles$all(), cycles$all(as.integer(6)))

aroma  <- new(aromaticity, model, cyc)

descriptor3 <- .jnew('org.openscience.cdk.qsar.descriptors.molecular.AromaticAtomsCountDescriptor')
val3 <- descriptor2$calculate(mols)
# 6
zachcp commented 3 months ago

So something like this could work:

set_aromatic <- function(molecules) {
  electron_donation <- J('org.openscience.cdk.aromaticity.ElectronDonation')
  aromaticity <- J('org.openscience.cdk.aromaticity.Aromaticity')
  cycles <- J('org.openscience.cdk.graph.Cycles')
  model  <- electron_donation$daylight()
  cyc    <- cycles$or(cycles$all(), cycles$all(as.integer(6)))

  aroma  <- new(aromaticity, model, cyc)

  for (mol in mols) {


zachcp commented 3 months ago

another workaround is using lowercase in SMILES to denote aromaticity:

c1=cc=cc=c1 vs: C1=CC=CC=1

zachcp commented 3 months ago
