Closed Aariq closed 1 year ago
That's indeed not what validSDF
checks in its current form. It only checks the atom and bond blocks (see help doc for ?validSDF
) mainly to determine whether compounds are drug-like enough for structural similarity comparisons. For instance, compounds without carbons or single atom compounds are flagged because most fragment based similarity metrics are not meaningful in those cases. Your use-case is different, and could be added as a different checking mode to the same function. Please also keep in mind that SDFs are quite complex and there are a vast number of validity checks one could come up with. Happy to add this to the to-do list or accept code contributions. A more generic solution might require additional discussion.
For debugging please also note that there is a read.SDFstr
function that creates an intermediate of SDFset
called SDFstr
. Imports into SDFstr
are much more forgiving and very helpful for debugging. After creating a SDFstr
you can coerce to SDFset
with as(sdfstr, "SDFset")
. In fact, internally read.SDFset
does it this way.
This issue will be addressed in a future release. Closing for now.
Got it! I haven't fully dug into the internals of ChemmineR
, but it makes sense that there are too many ways SDFs could go wrong to catch them all. Just for further context, this was motivated by my surprise that I could read in an invalid .mol file (missing a line in the header block) and get no error until later in the workflow when I tried to do something with it (like propOB
). Even then, the error isn't an R error (#14) and doesn't stop ChemmineR
from returning non-sensical values.
You mean non-sensical values for OpenBabel functions. The ChemmineR functions are not affected by this inconsistency in the header block. Only (some of) the OpenBabel functions that are provided via the ChemmineOB interface. This is certainly something we want to address too. To come up with a more generic checker utility that considers the OpenBabel restrictions, we might want to first generate a list of requirements and then draft an implementation that doesn't degrade performance (e.g. too many checks by default). Let us know if you have additional suggestions for this checker list so that we can incorporate them. To indicate problems with OpenBabel operations, one could also trigger a warning when OpenBabel functions return only zero values. In addition, one could return NA instead of zero values in cases where it is 100% clear that zero is a failed prediction rather than something meaningful.
Again thanks for reporting this issue.
Thomas
On Wed, Jul 5, 2023 at 8:03 AM Eric R. Scott @.***> wrote:
Got it! I haven't fully dug into the internals of ChemmineR, but it makes sense that there are too many ways SDFs could go wrong to catch them all. Just for further context, this was motivated by my surprise that I could read in an invalid .mol file (missing a line in the header block) and get no error until later in the workflow when I tried to do something with it (like propOB). Even then, the error isn't an R error (#14 https://github.com/girke-lab/ChemmineR/issues/14) and doesn't stop ChemmineR from returning non-sensical values.
— Reply to this email directly, view it on GitHub https://github.com/girke-lab/ChemmineR/issues/16#issuecomment-1621946875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKGMVH3RXE5L73WEB3QYSLXOV62RANCNFSM6AAAAAAZ2LPA6Y . You are receiving this because you commented.Message ID: @.***>
-- Thomas Girke, Ph.D. Professor of Bioinformatics Director of High-Performance Computing Center (HPCC) Director of Graduate Program in Genetics, Genomics and Bioinformatics (GGB) 1207F Genomics Building University of California Riverside, CA 92521
E-mail: @.*** URL: https://girke.bioinformatics.ucr.edu Phone/Cell/Text: 951-732-7072 Fax: 951-827-4437
I noticed that
validSDF()
doesn't check the header block (which is maybe why the error I described in #14 is getting through pastread.SDFset()
). Here's a reprex and suggestion.Created on 2023-06-30 with reprex v2.0.2
Session info
``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.2 (2022-10-31) #> os macOS Big Sur ... 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Los_Angeles #> date 2023-06-30 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.2.0) #> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.2.0) #> ChemmineOB * 1.36.0 2022-11-01 [1] Bioconductor #> ChemmineR * 3.49.2 2022-12-25 [1] https://girke-lab.r-universe.dev (R 4.2.2) #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.0) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.0) #> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.0) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.0) #> dplyr 1.1.0 2023-01-29 [1] CRAN (R 4.2.0) #> DT 0.27 2023-01-17 [1] CRAN (R 4.2.0) #> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.0) #> fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0) #> ggplot2 3.4.2 2023-04-03 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.2.0) #> gtable 0.3.3 2023-03-21 [1] CRAN (R 4.2.0) #> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.2.0) #> htmlwidgets 1.6.1 2023-01-07 [1] CRAN (R 4.2.0) #> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) #> png 0.1-8 2022-11-29 [1] CRAN (R 4.2.0) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) #> Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.2.0) #> RCurl 1.98-1.10 2023-01-27 [1] CRAN (R 4.2.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0) #> rjson 0.2.21 2022-01-09 [1] CRAN (R 4.2.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.2.0) #> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0) #> rsvg 2.4.0 2022-11-21 [1] CRAN (R 4.2.0) #> scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> styler 1.9.1 2023-03-04 [1] CRAN (R 4.2.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.2.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.2) #> vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.37 2023-01-31 [1] CRAN (R 4.2.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0) #> zlibbioc 1.44.0 2022-11-01 [1] Bioconductor #> #> [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```