`Rsamtools::indexTabix`: Capture console output

bschilder commented 2 years ago

I recently posted an error I've been encountering with seqminer (here) but realized it seems to trace all the way back to tabix itself.

Regardless, in cases where I get the following kind of error...

[E::hts_idx_push] Unsorted positions on sequence #2: 71999802 followed by 7

...I'd like to wrap the function in atryCatch and capture the messages so that I rerun it with the offending lines in the skip argument. Something like:

out <- tryCatch({  
               Rsamtools::indexTabix(file = bgz_file,
                                     seq = 2,
                                     start = 3,
                                     end = 3, 
                                     comment ="SNP"
               ) 
           })

However, I can't seem to get the output into R. I've tried:

tryCatch (above)
capture.output
sink()

Do you know a way to extract this info?

Thanks, Brian

Session info

``` R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 11.4 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] arrow_7.0.0 ggimage_0.3.0 ggplot2_3.3.5 dplyr_1.0.8 hexSticker_0.4.9 [6] echotabix_0.99.3 loaded via a namespace (and not attached): [1] AnnotationHub_3.2.2 BiocFileCache_2.2.1 systemfonts_1.0.4 [4] igraph_1.2.11 BiocParallel_1.28.3 GenomeInfoDb_1.30.1 [7] digest_0.6.29 yulab.utils_0.0.4 htmltools_0.5.2 [10] magick_2.7.3 fansi_1.0.2 magrittr_2.0.2 [13] memoise_2.0.1 BSgenome_1.62.0 echoverseTemplate_0.99.0 [16] ontologyPlot_1.6 openxlsx_4.2.5 Biostrings_2.62.0 [19] matrixStats_0.61.0 R.utils_2.11.0 sysfonts_0.8.5 [22] prettyunits_1.1.1 colorspace_2.0-3 blob_1.2.2 [25] rappdirs_0.3.3 textshaping_0.3.6 xfun_0.30 [28] crayon_1.5.0 RCurl_1.98-1.6 echodata_0.99.6 [31] jsonlite_1.8.0 hexbin_1.28.2 graph_1.72.0 [34] VariantAnnotation_1.40.0 glue_1.6.2 gtable_0.3.0 [37] zlibbioc_1.40.0 XVector_0.34.0 DelayedArray_0.20.0 [40] Rgraphviz_2.38.0 BiocGenerics_0.40.0 scales_1.1.1 [43] DBI_1.1.2 Rcpp_1.0.8.2 showtextdb_3.0 [46] xtable_1.8-4 progress_1.2.2 gridGraphics_0.5-1 [49] bit_4.0.4 clisymbols_1.2.0 stats4_4.1.0 [52] DT_0.21 htmlwidgets_1.5.4 httr_1.4.2 [55] ontologyIndex_2.7 ellipsis_0.3.2 pkgconfig_2.0.3 [58] XML_3.99-0.9 R.methodsS3_1.8.1 farver_2.1.0 [61] seqminer_8.4 dbplyr_2.1.1 utf8_1.2.2 [64] here_1.0.1 ggplotify_0.1.0 tidyselect_1.1.2 [67] labeling_0.4.2 rlang_1.0.2 later_1.3.0 [70] AnnotationDbi_1.56.2 BiocVersion_3.14.0 munsell_0.5.0 [73] tools_4.1.0 cachem_1.0.6 cli_3.2.0 [76] generics_0.1.2 RSQLite_2.2.10 evaluate_0.15 [79] stringr_1.4.0 fastmap_1.1.0 yaml_2.3.5 [82] ragg_1.2.2 knitr_1.37 bit64_4.0.5 [85] fs_1.5.2 zip_2.2.0 purrr_0.3.4 [88] KEGGREST_1.34.0 gh_1.3.0 showtext_0.9-5 [91] mime_0.12 R.oo_1.24.0 xml2_1.3.3 [94] biomaRt_2.50.3 brio_1.1.3 compiler_4.1.0 [97] rstudioapi_0.13 interactiveDisplayBase_1.32.0 filelock_1.0.2 [100] curl_4.3.2 png_0.1-7 testthat_3.1.2 [103] paintmap_1.0 tibble_3.1.6 stringi_1.7.6 [106] GenomicFeatures_1.46.5 desc_1.4.1 lattice_0.20-45 [109] Matrix_1.4-0 vctrs_0.3.8 pillar_1.7.0 [112] lifecycle_1.0.1 BiocManager_1.30.16 data.table_1.14.2 [115] bitops_1.0-7 httpuv_1.6.5 rtracklayer_1.54.0 [118] GenomicRanges_1.46.1 R6_2.5.1 BiocIO_1.4.0 [121] promises_1.2.0.1 IRanges_2.28.0 ontoProc_1.16.0 [124] assertthat_0.2.1 pkgload_1.2.4 SummarizedExperiment_1.24.0 [127] rprojroot_2.0.2 rjson_0.2.21 withr_2.5.0 [130] GenomicAlignments_1.30.0 Rsamtools_2.10.0 S4Vectors_0.32.3 [133] GenomeInfoDbData_1.2.7 parallel_4.1.0 hms_1.1.1 [136] grid_4.1.0 ggfun_0.0.5 tidyr_1.2.0 [139] rmarkdown_2.13 MatrixGenerics_1.6.0 piggyback_0.1.1 [142] Biobase_2.54.0 shiny_1.7.1 lubridate_1.8.0 [145] restfulr_0.0.13 ```

mtmorgan commented 2 years ago

These errors are written to C language 'stderr' but unfortunately this is not directly available within R. From the command line, on Linux, one could run a script script.R and redirect stderr to a file

Rscript -f script.R 2> stderr.log

but probably your intention is something different.

Previously, Rsamtools used a 'hack' to remap fprintf(stderr, ...) to R's error stream, with something like

DFLAGS = \
  -Dfprintf=_samtools_fprintf \
  -Dexit=_samtools_exit \
  -Dabort=_samtools_abort

This would have captured samtools errors as R warnings, but unfortunately this functionality was lost when we shifted to using Rhtslib quite some time ago; perhaps @hpages has additional comment.

I don't think there's an easy way to capture what you'd like to do.

hpages commented 2 years ago

Hi Martin,

HTSlib uses a more sophisticated mechanism than Samtools for logging events: now everything goes thru the hts_log() function defined in header file hts_log.h. The function handles various levels of event severity: error, warning, debug, and trace. There's a macro defined for each level (hts_log_error(), hts_log_warning(), etc...). Each macro is a simple wrapper around hts_log().

Note that hts_log() itself is not a macro, it's a real function, and it's not an inline function either, so I'm not sure why it's defined in a header file.

Anyways, this new mechanism and its implementation differ substantially from the plain fprintf(stderr, ...) calls used all over the place in Samtools. I didn't investigate so I don't know how easy/hard it would be to come up with a hack similar to yours where all the events logged thru hts_log_error() get captured as R warnings.

Ideas, suggestions, patches are welcome.

Thanks, H.

hpages commented 2 years ago

Personally I'm not planning to tweak Rhtslib to make the low-level C code in htslib spit output that is capturable from R but maybe someone else wants to give it a shot? If not, we can close this.

Thanks, H.

Bioconductor / Rsamtools

`Rsamtools::indexTabix`: Capture console output #32

Session info