GoekeLab / proActiv

Estimation of Promoter Activity from RNA-Seq data
https://goekelab.github.io/proActiv/
Other
49 stars 14 forks source link

getTranscriptRanges() function limits species this can be applied to #23

Closed G-Thomson closed 3 years ago

G-Thomson commented 4 years ago

I would like to use this package to study some data generated from from Arabidopsis. However when I run preparePromoterAnnotation() I get:

Error in extractSeqlevels(species, style) : The style specified by 'UCSC' does not have a compatible entry for the species Arabidopsis_thaliana

Is this because the getTranscriptRanges() function (and other functions?) are trying to force GenomeInfoDb functions to use the UCSC naming scheme, which Arabidopsis is not included in?

Is there a downstream reason UCSC is used or could NCBI or Ensembl conventions be used?

jonathangoeke commented 4 years ago

Hi @G-Thomson proActiv is now hosted on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/proActiv.html). Can you try to install the Bioconductor version, and if you still encounter this error can you post it to the Bioconductor forum and tag proActiv? https://support.bioconductor.org/ Ideally you can post the output from sessionInfo() and what input data you use so that we can reproduce this error, we should then be able to address this. Thanks!

jleechung commented 3 years ago

Hi @G-Thomson , Thanks for raising this issue! I've tried creating the annotation object for Arabidopsis and it seems to work for me. GFF used can be found at: ftp://ftp.ensemblgenomes.org/pub/plants/release-48/gff3/arabidopsis_thaliana

> file <- "Arabidopsis_thaliana.TAIR10.48.gff3.gz"

> show(names(GenomeInfoDb::genomeStyles()))
 [1] "Arabidopsis_thaliana"     "Caenorhabditis_elegans"   "Canis_familiaris"         "Cyanidioschyzon_merolae" 
 [5] "Drosophila_melanogaster"  "Homo_sapiens"             "Mus_musculus"             "Oryza_sativa"            
 [9] "Populus_trichocarpa"      "Rattus_norvegicus"        "Saccharomyces_cerevisiae" "Zea_mays"                
> species <- names(GenomeInfoDb::genomeStyles())[1]

> annotation <- preparePromoterAnnotation(file = file, species = species)
Parsing input file...
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Extract exons by transcripts...
Identify overlapping first exons for each gene...
Prepare mapping between transcripts, tss, promoters and genes...
Prepare annotated intron ranges...
Annotating reduced exon ranges...
Prepare promoter coordinates and first exon ranges...

Session Info:

 sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_Singapore.1252  LC_CTYPE=English_Singapore.1252    LC_MONETARY=English_Singapore.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Singapore.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] proActiv_0.99.2 testthat_2.3.2 

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1            ellipsis_0.3.1              rprojroot_1.3-2            
  [4] biovizBase_1.36.0           htmlTable_2.0.1             XVector_0.28.0             
  [7] GenomicRanges_1.40.0        base64enc_0.1-3             fs_1.4.2                   
 [10] dichromat_2.0-0             rstudioapi_0.11             remotes_2.1.1              
 [13] bit64_0.9-7                 AnnotationDbi_1.50.1        fansi_0.4.1                
 [16] splines_4.0.2               knitr_1.29                  geneplotter_1.66.0         
 [19] pkgload_1.1.0               Formula_1.2-3               Rsamtools_2.4.0            
 [22] annotate_1.66.0             cluster_2.1.0               dbplyr_1.4.4               
 [25] png_0.1-7                   compiler_4.0.2              httr_1.4.1                 
 [28] backports_1.1.7             lazyeval_0.2.2              assertthat_0.2.1           
 [31] Matrix_1.2-18               cli_2.0.2                   htmltools_0.5.0            
 [34] acepack_1.4.1               prettyunits_1.1.1           tools_4.0.2                
 [37] gtable_0.3.0                glue_1.4.1                  GenomeInfoDbData_1.2.3     
 [40] dplyr_1.0.1                 rappdirs_0.3.1              Rcpp_1.0.5                 
 [43] Biobase_2.48.0              vctrs_0.3.2                 Biostrings_2.56.0          
 [46] rtracklayer_1.48.0          xfun_0.15                   stringr_1.4.0              
 [49] ps_1.3.3                    lifecycle_0.2.0             ensembldb_2.12.1           
 [52] devtools_2.3.0              XML_3.99-0.4                zlibbioc_1.34.0            
 [55] scales_1.1.1                BSgenome_1.56.0             VariantAnnotation_1.34.0   
 [58] ProtGenerics_1.20.0         hms_0.5.3                   parallel_4.0.2             
 [61] SummarizedExperiment_1.18.2 AnnotationFilter_1.12.0     RColorBrewer_1.1-2         
 [64] curl_4.3                    memoise_1.1.0               gridExtra_2.3              
 [67] ggplot2_3.3.2               biomaRt_2.44.1              rpart_4.1-15               
 [70] latticeExtra_0.6-29         stringi_1.4.6               RSQLite_2.2.0              
 [73] genefilter_1.70.0           S4Vectors_0.26.1            desc_1.2.0                 
 [76] checkmate_2.0.0             GenomicFeatures_1.40.1      BiocGenerics_0.34.0        
 [79] pkgbuild_1.1.0              BiocParallel_1.22.0         GenomeInfoDb_1.24.2        
 [82] rlang_0.4.7                 pkgconfig_2.0.3             matrixStats_0.56.0         
 [85] bitops_1.0-6                lattice_0.20-41             purrr_0.3.4                
 [88] GenomicAlignments_1.24.0    htmlwidgets_1.5.1           bit_1.1-15.2               
 [91] processx_3.4.3              tidyselect_1.1.0            magrittr_1.5               
 [94] DESeq2_1.28.1               R6_2.4.1                    IRanges_2.22.2             
 [97] generics_0.0.2              Hmisc_4.4-0                 DelayedArray_0.14.1        
[100] DBI_1.1.0                   pillar_1.4.6                foreign_0.8-80             
[103] withr_2.2.0                 survival_3.1-12             RCurl_1.98-1.2             
[106] nnet_7.3-14                 tibble_3.0.3                crayon_1.3.4               
[109] BiocFileCache_1.12.0        jpeg_0.1-8.1                progress_1.2.2             
[112] usethis_1.6.1               locfit_1.5-9.4              grid_4.0.2                 
[115] data.table_1.13.0           blob_1.2.1                  callr_3.4.3                
[118] digest_0.6.25               xtable_1.8-4                openssl_1.4.2              
[121] stats4_4.0.2                munsell_0.5.0               Gviz_1.32.0                
[124] sessioninfo_1.1.1           askpass_1.1

Let me know if this works for you!