Closed mahabubp closed 2 years ago
Hm, yeah, think I know what it is, will add some tests to avoid this in the future, will let you know when I push to github
First, could you do:
devtools::install_github("Roleren/ORFik") then restart R and try again, I think this is a bug on main branch and that I fixed it earlier.
Okay. I tried it but it didn't work. Its showing the same error.
Ok, does the example work for you?
library(ORFik)
gtf_file <- system.file("extdata/Danio_rerio_sample", "annotations.gtf", package = "ORFik")
riboSeq_file <- system.file("extdata/Danio_rerio_sample", "ribo-seq.bam", package = "ORFik")
footprints <- readBam(riboSeq_file)
shifts <- detectRibosomeShifts(footprints, gtf_file)
shiftedReads <- shiftFootprints(footprints, shifts)
yes, your example did work for me. I wonder why it doesn't work for my files.
Ok, print me these 5 things from what you wrote on top (show me consol output), then I should know what it is:
reads
seqinfo(reads)
txdb
seqinfo(txdb)
sessionInfo()
Okay. here it is.
library(ORFik) gtf_file <- "~/Homo_sapiens.GRCh38.107.gtf" riboSeq_file <- "/media/lrge/32305f08-3e34-4bbb-94db-3f17f7760b86/lrge/Pasha/STAR/S2620_7_R1/S2620_7_R1_Aligned.sortedByCoord.out.bam" footprints <- readBam(riboSeq_file) shifts <- detectRibosomeShifts(footprints, gtf_file) Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Error in covRleFromGR(x, weight = weight, ignore.strand = ignore.strand) : Seqlengths of x contains NA values! In addition: Warning message: In .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. shiftedReads <- shiftFootprints(footprints, shifts) Error in is(shifts, "data.frame") : object 'shifts' not found
Ok, from what you wrote print me:
footprints
seqinfo(footprints)
txdb <- loadTxdb(gtf_file)
txdb
seqinfo(txdb)
sessionInfo()
Here you go..
print(footprints) GAlignments object with 22307236 alignments and 0 metadata columns: seqnames strand cigar qwidth start end width njunc
[1] 1 - 66M 66 14694 14759 66 0 [2] 1 - 2S67M 69 14695 14761 67 0 [3] 1 - 61M 61 16071 16131 61 0 [4] 1 - 61M 61 16071 16131 61 0 [5] 1 - 33M296N29M 62 16278 16635 358 1 ... ... ... ... ... ... ... ... ... [22307232] KI270752.1 - 15M 15 8029 8043 15 0 [22307233] KI270752.1 + 19M1S 20 22153 22171 19 0 [22307234] KI270752.1 + 19M1S 20 22153 22171 19 0 [22307235] KI270512.1 + 2S19M 21 20511 20529 19 0 [22307236] KI270466.1 - 63M 63 1005 1067 63 0 ------- seqinfo: 194 sequences from an unspecified genome print(seqinfo(footprints)) Seqinfo object with 194 sequences from an unspecified genome: seqnames seqlengths isCircular genome 1 248956422 10 133797422 11 135086622 12 133275309 13 114364328 ... ... ... ... KI270539.1 993 KI270385.1 990 KI270423.1 981 KI270392.1 971 KI270394.1 970 print(txdb <- loadTxdb(gtf_file)) Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK TxDb object: # Db type: TxDb # Supporting package: GenomicFeatures # Data source: ~/Homo_sapiens.GRCh38.107.gtf # Organism: NA # Taxonomy ID: NA # miRBase build ID: NA # Genome: NA # Nb of transcripts: 251121 # Db created by: GenomicFeatures package from Bioconductor # Creation time: 2022-11-04 13:41:34 +0100 (Fri, 04 Nov 2022) # GenomicFeatures version at creation time: 1.50.1 # RSQLite version at creation time: 2.2.18 # DBSCHEMAVERSION: 1.2 Warning message: In .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. print(txdb) TxDb object: # Db type: TxDb # Supporting package: GenomicFeatures # Data source: ~/Homo_sapiens.GRCh38.107.gtf # Organism: NA # Taxonomy ID: NA # miRBase build ID: NA # Genome: NA # Nb of transcripts: 251121 # Db created by: GenomicFeatures package from Bioconductor # Creation time: 2022-11-04 13:41:34 +0100 (Fri, 04 Nov 2022) # GenomicFeatures version at creation time: 1.50.1 # RSQLite version at creation time: 2.2.18 # DBSCHEMAVERSION: 1.2 print(seqinfo(txdb)) Seqinfo object with 47 sequences (1 circular) from an unspecified genome; no seqlengths: seqnames seqlengths isCircular genome 1 2 3 4 5 ... ... ... ... KI270731.1 KI270733.1 KI270734.1 KI270744.1 KI270750.1 print(sessionInfo()) R version 4.2.1 (2022-06-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.1 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=cs_CZ.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=cs_CZ.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=cs_CZ.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.4 ORFik_1.17.8 GenomicAlignments_1.34.0 Rsamtools_2.14.0
[5] Biostrings_2.66.0 XVector_0.38.0 SummarizedExperiment_1.28.0 MatrixGenerics_1.10.0
[9] matrixStats_0.62.0 GenomicFeatures_1.50.1 AnnotationDbi_1.60.0 Biobase_2.58.0
[13] GenomicRanges_1.50.0 GenomeInfoDb_1.34.0 IRanges_2.32.0 S4Vectors_0.36.0
[17] BiocGenerics_0.44.0
loaded via a namespace (and not attached):
[1] biomartr_1.0.2 bitops_1.0-7 bit64_4.0.5 filelock_1.0.2 RColorBrewer_1.1-3
[6] progress_1.2.2 httr_1.4.4 tools_4.2.1 utf8_1.2.2 R6_2.5.1
[11] DBI_1.1.3.9003 colorspace_2.0-3 gridExtra_2.3 tidyselect_1.2.0 prettyunits_1.1.1
[16] DESeq2_1.38.0 bit_4.0.4 curl_4.3.3 compiler_4.2.1 cli_3.4.1
[21] xml2_1.3.3 DelayedArray_0.24.0 fstcore_0.9.12 rtracklayer_1.58.0 scales_1.2.1
[26] genefilter_1.80.0 rappdirs_0.3.3 stringr_1.4.1 digest_0.6.30 R.utils_2.12.1
[31] pkgconfig_2.0.3 fst_0.9.8 dbplyr_2.2.1 fastmap_1.1.0 BSgenome_1.66.0
[36] rlang_1.0.6 rstudioapi_0.14 RSQLite_2.2.18 BiocIO_1.8.0 generics_0.1.3
[41] jsonlite_1.8.3 BiocParallel_1.32.0 dplyr_1.0.10 R.oo_1.25.0 RCurl_1.98-1.9
[46] magrittr_2.0.3 GenomeInfoDbData_1.2.9 Matrix_1.5-1 Rcpp_1.0.9 munsell_0.5.0
[51] fansi_1.0.3 lifecycle_1.0.3 R.methodsS3_1.8.2 stringi_1.7.8 yaml_2.3.6
[56] zlibbioc_1.44.0 BiocFileCache_2.6.0 grid_4.2.1 blob_1.2.3 parallel_4.2.1
[61] crayon_1.5.2 lattice_0.20-45 cowplot_1.1.1 splines_4.2.1 annotate_1.76.0
[66] hms_1.1.2 KEGGREST_1.38.0 locfit_1.5-9.6 pillar_1.8.1 rjson_0.2.21
[71] geneplotter_1.76.0 codetools_0.2-18 biomaRt_2.54.0 XML_3.99-0.12 glue_1.6.2
[76] png_0.1-7 vctrs_0.5.0 gtable_0.3.1 assertthat_0.2.1 cachem_1.0.6
[81] ggplot2_3.3.6 xtable_1.8-4 restfulr_0.0.15 survival_3.4-0 tibble_3.1.8
[86] memoise_2.0.1 ellipsis_0.3.2
What is going on with your seqinfo values, there should be NA values for seqlength of txdb Not blanks as you have, here is how I print it:
> seqinfo(footprints)
Seqinfo object with 1133 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
MT 16596 <NA> <NA>
Zv9_NA1 22128 <NA> <NA>
Zv9_NA10 48062 <NA> <NA>
Zv9_NA100 26026 <NA> <NA>
Zv9_NA1000 942 <NA> <NA>
... ... ... ...
chr5 75682077 <NA> <NA>
chr6 59938731 <NA> <NA>
chr7 77276063 <NA> <NA>
chr8 56184765 <NA> <NA>
chr9 58232459 <NA> <NA>
Do you not get
print(seqinfo(footprints)) Seqinfo object with 194 sequences from an unspecified genome: seqnames seqlengths isCircular genome 1 248956422
10 133797422 11 135086622 12 133275309 13 114364328 ... ... ... ... KI270539.1 993 KI270385.1 990 KI270423.1 981 KI270392.1 971 KI270394.1 970
you get same as me here ?
> footprints@seqinfo@is_circular[1]
[1] NA
yes
And this?
> class(seqinfo(txdb)@seqlengths[1])
[1] "integer"
yes
And this does not crash for you too ?
> seqinfo(footprints)[names(seqinfo(txdb))]
Seqinfo object with 2 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
chr8 56184765 NA <NA>
chr24 43947580 NA <NA>
yes, it doesn't. I see the similar pattern as yours.
Ok, then this is very mysterious, we need to debug: Run this:
debug(ORFik:::covRleFromGR)
shifts <- detectRibosomeShifts(footprints, txdb)
And when have entered covRleFromGr, print me directly:
seqinfo(x)
Seqinfo object with 2 sequences from an unspecified genome:
seqnames seqlengths isCircular genome
chr24 43947580 NA <NA>
chr8 56184765 NA <NA>
For you then there is most likely some NA in there for seqlengths
seqinfo(x) Seqinfo object with 47 sequences (1 circular) from an unspecified genome: seqnames seqlengths isCircular genome 1 248956422
10 133797422 11 135086622 12 133275309 13 114364328 ... ... ... ... KI270442.1 KI270726.1 KI270733.1 KI270744.1 KI270750.1
Yes, before there were some values for the following, now they disappeared, showing only NA for them KI270442.1 KI270726.1 KI270733.1 KI270744.1 KI270750.1
Aha, we found a bug in GenomicFeatures package then, of bioconductor core, this should be reported, I will see if I have time. Will write up a temp fix for you now, just have to check it works
Hmm okay. Thanks a ton for your time and help.. Hope temp fix would work.
Try this:
undebug(ORFik:::covRleFromGR)
# Now let's make a txdb with valid seqinfo, hopefully footprints have all seqlevels we need
txdb <- GenomicFeatures::makeTxDbFromGFF(gtf_file, organism = "Homo sapiens",
chrominfo = seqinfo(footprints))
shifts <- detectRibosomeShifts(footprints, txdb)
Let me know if txdb step does not work, then we need to get the 100% correct seqinfo from the fasta index of the genome itself. Update! Remember to use txdb in shift function, not the original gtf!
Yes, it worked. Thanks a ton.. Please let me know if I should close the issue?
Yes, good that it worked. This error is a bit dangerous and should be fixed, but at least we have a work around. You can now close this issue :)
Hi Roleren,
When I run shiftedReads <- shiftFootprints(reads, detectRibosomeShifts(reads, txdb)), I get the following error. Could you please look into it? Thanks
Error in covRleFromGR(x, weight = weight, ignore.strand = ignore.strand) : Seqlengths of x contains NA values!