gmbecker / genbankr

http://bioconductor.org/packages/devel/bioc/html/genbankr.html
14 stars 9 forks source link

GenomicRanges error while loading specific genbank file #4

Closed FelixErnst closed 4 years ago

FelixErnst commented 5 years ago

Hi

The following call ends up with an error originating from the GenomicRanges packages.

> readGenBank(GBAccession("NR_046235.3"), verbose = TRUE)
Done Parsing raw GenBank file text. [ 0.0299999999988358 seconds ]
2019-02-15 17:29:10 Starting creation of gene GRanges
Annotations don't have 'locus_tag' label, using 'gene' as gene_id column
2019-02-15 17:29:10 Starting creation of CDS GRanges
2019-02-15 17:29:10 Starting creation of exon GRanges
Error in getListElement(x, i, ...) : 
  GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment

I am not sure, what might be the problem, but it seams that the GenomicRanges packages is used in an incompatible way.

> sessionInfo() 
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocStyle_2.10.0                        genbankr_1.10.0                         tRNAscanImport_1.2.0                   
 [4] tRNA_1.0.0                              EnsDb.Hsapiens.v86_2.99.0               ensembldb_2.6.5                        
 [7] AnnotationFilter_1.6.0                  org.Hs.eg.db_3.7.0                      BSgenome.Hsapiens.UCSC.hg38_1.4.1      
[10] BSgenome_1.50.0                         rtracklayer_1.42.1                      Biostrings_2.50.2                      
[13] XVector_0.22.0                          TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0 GenomicFeatures_1.34.3                 
[16] AnnotationDbi_1.44.0                    Biobase_2.42.0                          GenomicRanges_1.34.0                   
[19] GenomeInfoDb_1.18.2                     IRanges_2.16.0                          S4Vectors_0.20.1                       
[22] BiocGenerics_0.28.0                    

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.14.0         bitops_1.0-6                matrixStats_0.54.0          assertive.models_0.0-2     
 [5] bit64_0.9-7                 progress_1.2.0              httr_1.4.0                  assertive.datetimes_0.0-2  
 [9] tools_3.5.2                 R6_2.3.0                    DBI_1.0.0                   lazyeval_0.2.1             
[13] colorspace_1.4-0            assertive.data_0.0-3        assertive.reflection_0.0-4  tidyselect_0.2.5           
[17] prettyunits_1.0.2           bit_1.1-14                  curl_3.3                    compiler_3.5.2             
[21] assertive.properties_0.0-4  DelayedArray_0.8.0          assertive.files_0.0-2       scales_1.0.0               
[25] stringr_1.4.0               digest_0.6.18               Rsamtools_1.34.1            rmarkdown_1.11             
[29] rentrez_1.2.1               assertive.numbers_0.0-2     pkgconfig_2.0.2             htmltools_0.3.6            
[33] rlang_0.3.1                 rstudioapi_0.9.0            RSQLite_2.1.1               assertive_0.3-5            
[37] bindr_0.1.1                 jsonlite_1.6                BiocParallel_1.16.6         dplyr_0.7.8                
[41] VariantAnnotation_1.28.10   RCurl_1.95-4.11             magrittr_1.5                GenomeInfoDbData_1.2.0     
[45] Matrix_1.2-15               Rcpp_1.0.0                  munsell_0.5.0               stringi_1.2.4              
[49] assertive.base_0.0-7        yaml_2.2.0                  SummarizedExperiment_1.12.0 zlibbioc_1.28.0            
[53] plyr_1.8.4                  grid_3.5.2                  blob_1.1.1                  crayon_1.3.4               
[57] lattice_0.20-38             assertive.code_0.0-3        hms_0.4.2                   knitr_1.21                 
[61] pillar_1.3.1                assertive.sets_0.0-3        codetools_0.2-16            biomaRt_2.38.0             
[65] glue_1.3.0                  XML_3.98-1.17               evaluate_0.13               BiocManager_1.30.4         
[69] purrr_0.3.0                 gtable_0.2.0                assertive.strings_0.0-3     assertthat_0.2.0           
[73] ggplot2_3.1.0               xfun_0.4                    assertive.types_0.0-3       assertive.data.uk_0.0-2    
[77] tibble_2.0.1                GenomicAlignments_1.18.1    memoise_1.1.0               bindrcpp_0.2.2             
[81] assertive.matrices_0.0-2    assertive.data.us_0.0-2    
shuyuzheng commented 4 years ago

I met the same problem. I found the following changes of GenomicRanges in their NEWS. Hopefully, it is useful for debugging.

CHANGES IN VERSION 1.34.0

...

DEPRECATED AND DEFUNCT

o Deprecate several RangedData methods: seqinfo, seqinfo<-, seqnames, and
  findOverlaps#RangedData#GenomicRanges

  RangedData objects will be deprecated in BioC 3.9 (their use has been
  discouraged since BioC 2.12, that is, since 2014). Package developers
  that are still using RangedData objects need to migrate their code to
  use GRanges or GRangesList objects instead.

BUG FIXES

o Make [[, as.list(), lapply(), and unlist() fail more graciously on
  a GenomicRanges object.

o Make "show" methods for GenomicRanges and GPos objects robust to
  special metadata column names like "stringsAsFactors".

o Export the "update" method for GRanges objects. This addresses
  https://github.com/Bioconductor/GenomicRanges/issues/7
FelixErnst commented 4 years ago

Hi @gmbecker

any chance this might get fixed?

gmbecker commented 4 years ago

I will get this fixed today or tomorrow. Thanks

On Wed, Mar 18, 2020 at 2:44 AM Felix Ernst notifications@github.com wrote:

Hi @gmbecker https://github.com/gmbecker

any chance this might get fixed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gmbecker/genbankr/issues/4#issuecomment-600521591, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MLDGBG6K7KSHQM4ZT3RICJZJANCNFSM4GXYAU4Q .

gmbecker commented 4 years ago

Well that didn't happen, apologies. I have pushed a potential fix to the Bioconductor git repos (in master, ie devel) as package version 1.15.2. Please give it a try when you get the chance and make sure the output iso coorrect.

BTW, the issue stemmed from the transcript_id annotation being missing for the exons, which may (or may not?) be an error/oversight in the genbank file.

Hope this works and sorry again for the long delay.

FelixErnst commented 4 years ago

Thanks for the fix. I didn't have the chance to test it, since the changes are not found in this repo, are they?

Regarding the version number: Do you mean 1.15.2? 1.15 is the current Bioc devel version and you cannot go down.

gmbecker commented 4 years ago

@FelixErnst yes, 1.15.2, sorry. And the package should be available in the devel repo in the next day or two.

FelixErnst commented 4 years ago

The fixed work. Regarding the missing transcript_id: I think that the special nature of the accession number (It is human rRNA) might also play a role. However, now genbankr is the only solution I found so far, to retrieve rRNA information in the Bioconductor universe. TxDb doesn't contain it and ensembl is also weird and inconsistent.

Thank you, @gmbecker.

I will close this.