Open zachcp opened 4 years ago
further tests and input/output files attached. The problem seems to be that implicit H typing is lost in the write.molecules() process.
library("rcdk") Loading required package: rcdklibs Loading required package: rJava
load structure from downloaded pubchem .sdf
mols <- load.molecules("Structure2D_CID_338.sdf")
plot structure
img <- view.image.2d(mols[[1]]) plot.new() rasterImage(img, 0, 0, 1, 1) savePlot("Structure2D_CID_338.png") dev.off() null device 1
write to sdf
write.molecules(mols, "Structure2D_CID_338_cdk.sdf", write.props = TRUE)
load and re plot
mols2 <- load.molecules("Structure2D_CID_338_cdk.sdf")
img2 <- view.image.2d(mols2[[1]]) Error in .jcall(mi, "[B", "getBytes", as.integer(depictor$getWidth()), : java.lang.NullPointerException: One or more atoms had an undefined number of implicit hydrogens plot.new() rasterImage(img2, 0, 0, 1, 1) Error in rasterImage(img2, 0, 0, 1, 1) : object 'img2' not found savePlot("Structure2D_CID_338_cdk.png") dev.off() null device 1
sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.4 LTS
Matrix products: default BLAS: /shared/storage/biology/rsrch/tf-PAB/lab/trl1/bin_research0/lib/R/lib/libRblas.so LAPACK: /shared/storage/biology/rsrch/tf-PAB/lab/trl1/bin_research0/lib/R/lib/libRlapack.so
locale: [1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB [4] LC_COLLATE=en_GB LC_MONETARY=en_GB LC_MESSAGES=en_GB [7] LC_PAPER=en_GB LC_NAME=en_GB LC_ADDRESS=en_GB [10] LC_TELEPHONE=en_GB LC_MEASUREMENT=en_GB LC_IDENTIFICATION=en_GB
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] rcdk_3.5.0 rcdklibs_2.3 rJava_0.9-11
loaded via a namespace (and not attached): [1] compiler_3.6.2 parallel_3.6.2 fingerprint_3.5.7 iterators_1.0.12 [5] itertools_0.1-3 png_0.1-7
@trljcl apologies for the delay
I am looking at this again and agree that the hydrogens are lost. The write command is a wrapper of CDK's SDFWriter. Can you post this issue on the CDK user list? If it is explained/fixed in CDK, I can help get equivalent functionality in rCDK.
# suport for your theory
mols[[1]]$getAtomCount()
# 16
mols2[[1]]$getAtomCount()
# 10
From my email:
Structure2D_CID_338.sdf
Structure2D_CID_338_cdk_openbabel.sdf