Shians / NanoMethViz

Apache License 2.0
21 stars 2 forks source link

create_tabix_file output mixed up. #11

Closed cstill3928 closed 2 years ago

cstill3928 commented 2 years ago

Hi,

I recently ran the create_tabix_file function on a dataframe that looks like:

Screen Shot 2021-09-10 at 3 01 11 PM

This is a modified version of the megalodon per_read_modified_base_calls.tsv file (I added the motif column to satisfy requirements of the create_tabix_file function for recognizing Megalodon output). I was able to run the create_tabix_file, however I'm running into an issue where the function puts the read_id under the chromosome column as well as shows nothing for the strand, statistic, or read_name columns. See below:

Screen Shot 2021-09-10 at 3 04 33 PM

This does not resemble the file I'd expect from your example. I understand that you are starting from Nanopolish while I'm starting from Megalodon but that shouldn't cause this effect. Any help would be greatly appreciated! Thanks!

sessionInfo(): R version 4.1.1 (2021-08-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] Homo.sapiens_1.3.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [3] org.Hs.eg.db_3.13.0 GO.db_3.13.0
[5] OrganismDbi_1.34.0 GenomicFeatures_1.44.2
[7] GenomicRanges_1.44.0 GenomeInfoDb_1.28.4
[9] AnnotationDbi_1.54.1 IRanges_2.26.0
[11] S4Vectors_0.30.0 Biobase_2.52.0
[13] BiocGenerics_0.38.0 NanoMethViz_1.2.0
[15] ggplot2_3.3.5

loaded via a namespace (and not attached): [1] colorspace_2.0-2 bsseq_1.28.0 rjson_0.2.20 ellipsis_0.3.2
[5] XVector_0.32.0 fs_1.5.0 rstudioapi_0.13 scico_1.2.0
[9] bit64_4.0.5 fansi_0.5.0 xml2_1.3.2 R.methodsS3_1.8.1
[13] sparseMatrixStats_1.4.2 cachem_1.0.6 knitr_1.34 Rsamtools_2.8.0
[17] dbplyr_2.1.1 png_0.1-7 R.oo_1.24.0 graph_1.70.0
[21] HDF5Array_1.20.0 BiocManager_1.30.16 readr_2.0.1 compiler_4.1.1
[25] httr_1.4.2 assertthat_0.2.1 Matrix_1.3-4 fastmap_1.1.0
[29] limma_3.48.3 cli_3.0.1 BiocSingular_1.8.1 htmltools_0.5.2
[33] prettyunits_1.1.1 tools_4.1.1 rsvd_1.0.5 gtable_0.3.0
[37] glue_1.4.2 GenomeInfoDbData_1.2.6 dplyr_1.0.7 rappdirs_0.3.3
[41] Rcpp_1.0.7 vctrs_0.3.8 Biostrings_2.60.2 rhdf5filters_1.4.0
[45] rtracklayer_1.52.1 DelayedMatrixStats_1.14.3 xfun_0.25 stringr_1.4.0
[49] beachmat_2.8.1 lifecycle_1.0.0 irlba_2.3.3 restfulr_0.0.13
[53] gtools_3.9.2 XML_3.99-0.7 zlibbioc_1.38.0 scales_1.1.1
[57] vroom_1.5.4 BSgenome_1.60.0 hms_1.1.0 MatrixGenerics_1.4.3
[61] RBGL_1.68.0 SummarizedExperiment_1.22.0 rhdf5_2.36.0 curl_4.3.2
[65] yaml_2.2.1 memoise_2.0.0 biomaRt_2.48.3 stringi_1.7.4
[69] RSQLite_2.2.8 BiocIO_1.2.0 ScaledMatrix_1.0.0 permute_0.9-5
[73] filelock_1.0.2 BiocParallel_1.26.2 cpp11_0.3.1 rlang_0.4.11
[77] pkgconfig_2.0.3 matrixStats_0.60.1 bitops_1.0-7 evaluate_0.14
[81] lattice_0.20-44 purrr_0.3.4 Rhdf5lib_1.14.2 GenomicAlignments_1.28.0
[85] patchwork_1.1.1 bit_4.0.4 tidyselect_1.1.1 magrittr_2.0.1
[89] R6_2.5.1 generics_0.1.0 DelayedArray_0.18.0 DBI_1.1.1
[93] pillar_1.6.2 withr_2.4.2 KEGGREST_1.32.0 RCurl_1.98-1.4
[97] tibble_3.1.4 crayon_1.4.1 utf8_1.2.2 BiocFileCache_2.0.0
[101] tzdb_0.1.2 rmarkdown_2.10 progress_1.2.2 locfit_1.5-9.4
[105] grid_4.1.1 data.table_1.14.0 blob_1.2.2 forcats_0.5.1
[109] digest_0.6.27 tidyr_1.1.3 R.utils_2.10.1 munsell_0.5.0

Shians commented 2 years ago

Thanks for the bug report, could you try to install the developmental version using the following?

remotes::install_github("shians/NanoMethViz")

I've updated megalodon importing a few times since the release version.

cstill3928 commented 2 years ago

Hi Shians, oddly even though my sessionInfo showed NanomethViz as 1.2.0, that was actually the developmental version I was using there. When I went back to the non-dev version, the problem still persists and got worst. The position has shifted to the strand column and there is only an asterisk in the position column. The statistic and read_name columns are still empty. I did get a warning though that I need to look further into. Warning and table output shown below:

[E::get_intv] Failed to parse TBX_GENERIC, was wrong -p [type] used? The offending line was: "Examp_data fff8c78c-60c4-46ff-b807-9654b3c5126d * 169899"

Screen Shot 2021-09-13 at 9 53 53 AM

Either way, it seems that both versions present a problem with importing megalodon data. I didn't have any issues with the Nanopolish data provided as sample data in your package. Do you have any sample data processed by Megalodon that has worked well and you could share for me to test?

Thanks, Chris

cstill3928 commented 2 years ago

Hi Shians, so it appears that the eighth column I added to satisfy the earlier version's column requirements for Megalodon data ended up causing the new issue in the development version. After I took away that eighth column (the motif column). Everything worked great! Sorry for all the trouble and thanks for your time.