JCounts Error - ASpli - Githubissues

JesusMHU commented 1 year ago

Estimados, espero se encuentre bien. He estado teniendo un problema recurrente con ASpli que no he sabido solucionar.

Había estado analizando muchas librerias sin problemas, pero en tres ocasiones he tenido problemas con este mismo error cuando esta corriendo la funcion JCounts

El error:

asd <- jCounts(counts=gbcounts, features=features, minReadLength=18, libType="SE", strandMode=2) Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a methError ASpli.txtod for function 'sort': 'start' or 'end' cannot contain NAs In addition: Warning messages: 1: In matrix(unlist(strsplit(jnames, "[.]")), byrow = TRUE, ncol = 3) : data length [422626] is not a sub-multiple or multiple of the number of rows [140876] 2: In .createGRangesExpJunctions(junctionNames) : NAs introduced by coercion

Cosas que he intentado (solo para probar): -Cambiar el libtype y strandmode -He rehecho los BAMs e incluso los SAMs -Reanalicé utilizando otro programa de trimming. Antes estaba usando Trimgalore y pase a Trimmomatic. Trimmomatic me solucionó el problema para algunos datasets, pero hay estos tres en especifico que me siguen dando ese mismo error. Uno de ellos me interesa mucho porque es de los pocos que hay con raiz de arabidopsis en estres salino.

Dejo las accession numbers y mi sessioninfo.

Input

BAMFiles <- c("Control_1.bam","Control_2.bam", "NaCl_1.bam", "NaCl_2.bam")
targets <- data.frame(row.names = paste0('Sample',c(1:4)), bam = BAMFiles[1:4], f1 = c( 'control','control','control','treatment','treatment'), stringsAsFactors = FALSE); targets mBAMs <- data.frame( bam = c("Control_sorted.bam", "NaCl_sorted.bam"), condition = c("control","treatment")); mBAMs gbcounts <- gbCounts(features=features, targets=targets, minReadLength = 18, maxISize = 50000, libType="SE", strandMode=1) asd <- jCounts(counts=gbcounts, features=features, minReadLength=18, libType="SE", strandMode=1) #stranded gb <- gbDUreport(gbcounts, contrast = c(-1,1)) jdur <- jDUreport(asd, contrast=c(-1,1)) sr <- splicingReport(gb, jdur, counts=gbcounts) is <- integrateSignals(sr,asd, bin.FC=1) exportSplicingReports(sr, output.dir="19.sr") exportIntegratedSignals(is,sr=sr, output.dir = "19.SplicingReport", counts=gbcounts,features=features,asd=asd, mergedBams = mBAMs)

#######Accession numbers SRR8424695 Control_1 SRR8424694 Control_2 SRR8424692 NaCl_1 SRR8424691 NaCl_2

#######My sessioninfo

sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTS

Matrix products: default BLAS/LAPACK: /home/jesusm/anaconda3/envs/env1/lib/libopenblasp-r0.3.21.so

locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] statmod_1.4.37 GenomicFeatures_1.50.2 GenomicRanges_1.50.0 [4] GenomeInfoDb_1.34.1 ASpli_2.8.0 AnnotationDbi_1.60.0 [7] IRanges_2.32.0 S4Vectors_0.36.0 Biobase_2.58.0 [10] BiocGenerics_0.44.0 edgeR_3.40.0 limma_3.54.0

loaded via a namespace (and not attached): [1] colorspace_2.0-3 rjson_0.2.21 [3] deldir_1.0-6 ellipsis_0.3.2 [5] biovizBase_1.46.0 htmlTable_2.4.1 [7] XVector_0.38.0 base64enc_0.1-3 [9] dichromat_2.0-0.1 rstudioapi_0.14 [11] DT_0.26 bit64_4.0.5 [13] fansi_1.0.3 xml2_1.3.3 [15] codetools_0.2-18 splines_4.2.2 [17] cachem_1.0.6 knitr_1.41 [19] Formula_1.2-4 Rsamtools_2.14.0 [21] cluster_2.1.4 dbplyr_2.2.1 [23] png_0.1-8 BiocManager_1.30.19 [25] compiler_4.2.2 httr_1.4.4 [27] backports_1.4.1 lazyeval_0.2.2 [29] assertthat_0.2.1 Matrix_1.5-3 [31] fastmap_1.1.0 cli_3.5.0 [33] htmltools_0.5.4 prettyunits_1.1.1 [35] tools_4.2.2 igraph_1.3.5 [37] gtable_0.3.1 glue_1.6.2 [39] GenomeInfoDbData_1.2.9 dplyr_1.0.10 [41] rappdirs_0.3.3 Rcpp_1.0.9 [43] vctrs_0.5.1 Biostrings_2.66.0 [45] rtracklayer_1.58.0 xfun_0.36 [47] stringr_1.5.0 lifecycle_1.0.3 [49] ensembldb_2.22.0 restfulr_0.0.15 [51] XML_3.99-0.13 zlibbioc_1.44.0 [53] MASS_7.3-58.1 scales_1.2.1 [55] BiocStyle_2.26.0 BSgenome_1.66.1 [57] VariantAnnotation_1.44.0 ProtGenerics_1.30.0 [59] hms_1.1.2 MatrixGenerics_1.10.0 [61] SummarizedExperiment_1.28.0 AnnotationFilter_1.22.0 [63] RColorBrewer_1.1-3 yaml_2.3.6 [65] curl_4.3.3 memoise_2.0.1 [67] gridExtra_2.3 ggplot2_3.4.0 [69] UpSetR_1.4.0 biomaRt_2.54.0 [71] rpart_4.1.19 latticeExtra_0.6-30 [73] stringi_1.7.8 RSQLite_2.2.20 [75] BiocIO_1.8.0 checkmate_2.1.0 [77] filelock_1.0.2 BiocParallel_1.32.0 [79] rlang_1.0.6 pkgconfig_2.0.3 [81] matrixStats_0.63.0 bitops_1.0-7 [83] evaluate_0.19 lattice_0.20-45 [85] purrr_1.0.0 htmlwidgets_1.6.0 [87] GenomicAlignments_1.34.0 bit_4.0.5 [89] tidyselect_1.2.0 plyr_1.8.8 [91] magrittr_2.0.3 R6_2.5.1 [93] generics_0.1.3 Hmisc_4.7-2 [95] DelayedArray_0.24.0 DBI_1.1.3 [97] pillar_1.8.1 foreign_0.8-84 [99] survival_3.4-0 KEGGREST_1.38.0 [101] RCurl_1.98-1.9 nnet_7.3-18 [103] tibble_3.1.8 crayon_1.5.2 [105] interp_1.1-3 utf8_1.2.2 [107] BiocFileCache_2.6.0 rmarkdown_2.19 [109] jpeg_0.1-10 progress_1.2.2 [111] locfit_1.5-9.6 grid_4.2.2 [113] data.table_1.14.6 blob_1.2.3 [115] digest_0.6.31 pbmcapply_1.5.1 [117] tidyr_1.2.1 munsell_0.5.0 [119] Gviz_1.42.0

Saludos! Error ASpli.txt

JesusMHU commented 1 year ago

Otra info: -He hecho todos los analisis con la misma anotación. -Probé la devel version de ASpli 2.9 y aun me sale el error con estas librerias -He intentado cambiar el strandmode a unstranded, forward stranded y reverse stranded y genera el mismo error. -Mi read length es de 18. Probe cambiarlo a 50 y aun genera el mismo error. -Samtools view de dos de los BAMs problematicos:

(env1) jesusm@LAPTOP-S6C7NUN7:/mnt/e/WDSS/(19) PRJNA514040 - SE$ samtools view -H NaCl_1.bam @HD VN:1.0 SO:unsorted @SQ SN:Chr1 LN:30427671 @SQ SN:Chr2 LN:19698289 @SQ SN:Chr3 LN:23459830 @SQ SN:Chr4 LN:18585056 @SQ SN:Chr5 LN:26975502 @SQ SN:ChrC LN:154478 @SQ SN:ChrM LN:367808 @PG ID:hisat2 PN:hisat2 VN:2.1.0 CL:"/mnt/e/WDSS/tools/hisat2-2.1.0/hisat2-align-s --wrapper basic-0 --rna-strandness F -x /mnt/e/WDSS/TAIR10_Chr.all -S SRR8424692.fastq.gz.trim.fil.gz.sam -U /tmp/1818.unp"

(env1) jesusm@LAPTOP-S6C7NUN7:/mnt/e/WDSS/(19) PRJNA514040 - SE$ samtools view -H Control_1.bam @HD VN:1.0 SO:unsorted @SQ SN:Chr1 LN:30427671 @SQ SN:Chr2 LN:19698289 @SQ SN:Chr3 LN:23459830 @SQ SN:Chr4 LN:18585056 @SQ SN:Chr5 LN:26975502 @SQ SN:ChrC LN:154478 @SQ SN:ChrM LN:367808 @PG ID:hisat2 PN:hisat2 VN:2.1.0 CL:"/mnt/e/WDSS/tools/hisat2-2.1.0/hisat2-align-s --wrapper basic-0 --rna-strandness F -x /mnt/e/WDSS/TAIR10_Chr.all -S SRR8424695.fastq.gz.trim.fil.gz.sam -U /tmp/1859.unp"

estepi commented 1 year ago

Hola Jesus, gracias por reportar el issue y por proveer toda la info

De todo lo que me has mandado solo 2 cosas me resuenan: -El read length es de 18. Tienes reads de 18 pb?

En el comando de HISAT le dices que el strandness es Forward. Intentaría forzarlo a "modo unstradnded" en ASpli. Entiendo lo que dices. 0 o 1 deberían ser la opción, pero lo rechequeare.

espero noticias

rfarouni commented 1 year ago

Hi,

The jCounts function scrambles the order of the three columns once it encounters a rowname with NA in it. This happens during unlisting. I was able to solve the problem by filtering out those rows from the junction.counts object.

gb_counts@junction.counts[-which(rownames(gb_counts@junction.counts)=="NA"),]

I am not sure why I had an NA row in the first place.

Best, Rick

JesusMHU commented 1 year ago

Estimada Estefania, muchas gracias por tu respuesta. Te dejo en adjunto un par de fotos para que veas la read length distribution de uno de los BAMs problemáticos. Con 18bp y con 80bp. Trimming_18bp Trimming_80bp

Sin embargo, por lo que me sugeriste, probé hacer el trimming con 80 bp y el error se eliminó y pudo terminar el análisis. Pero me detectaba pocos eventos de splicing porque el trimming removia muchas lecturas. Así que probé el método que comentó arriba Rick de eliminar las columnas que tuviesen valores NA dentro del gb_counts@junction.counts y pude tambien terminar el analisis sin el error y con un read length un poco mas permisivo de 50bp. Así que agradezco a ambos por la ayuda.

Por otra parte, sigo un poco con la duda del strand specificity. Esto en vista de que el output de ASpli no pareciera que esta diferenciando el strand al que corresponden las lecturas. Tengo datasets stranded que he tratado como tal tanto en el mapeo como en ASpli y no veo diferencias entre el output cuando uso la strans specifity vs. cuando es unstranded. No se si sea porque en efecto no lo estoy haciendo bien, por esta confusion de que en el manual de ASpli dice que unstranded es 1 versus lo que dice en el manual que indica al hacer ?strandMode: 0 unstranded, 1 forward, 2 reverse.

Quedo atento, y agradecido de antemano.

Saludos!

JesusMHU commented 1 year ago

Hi,

The jCounts function scrambles the order of the three columns once it encounters a rowname with NA in it. This happens during unlisting. I was able to solve the problem by filtering out those rows from the junction.counts object.

gb_counts@junction.counts[-which(rownames(gb_counts@junction.counts)=="NA"),]

I am not sure why I had an NA row in the first place.

Best, Rick

Thank you so much Rick. I was able to solve the problem by removing the NA columns. In all cases it turned out to be only one. Regards,

Jesus

estepi commented 1 year ago

Thanks all of you for your great contribution and long live to splicing !

chernolab / ASpli

JCounts Error - ASpli #13

Input