Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
https://bioconductor.org/packages/BSgenome
9 stars 8 forks source link

Feature request: BSgenomeViews objects accept GPos objects in their 'granges' slot #2

Open hpages opened 5 years ago

hpages commented 5 years ago

Even though the BSgenomeViews() constructor accepts a GPos object:

library(BSgenome)
v <- BSgenomeViews("hg38", GPos("chrY:11001-12000"))
v
# BSgenomeViews object with 1000 views and 0 metadata columns:
#          seqnames    ranges strand            dna
#             <Rle> <IRanges>  <Rle> <DNAStringSet>
#      [1]     chrY     11001      *            [C]
#      [2]     chrY     11002      *            [C]
#      [3]     chrY     11003      *            [A]
#      [4]     chrY     11004      *            [C]
#      [5]     chrY     11005      *            [C]
#      ...      ...       ...    ...            ...
#    [996]     chrY     11996      *            [C]
#    [997]     chrY     11997      *            [A]
#    [998]     chrY     11998      *            [A]
#    [999]     chrY     11999      *            [T]
#   [1000]     chrY     12000      *            [T]
#   -------
#   seqinfo: 455 sequences (1 circular) from hg38 genome

the supplied GPos object is stored in the BSgenomeViews object as a GRanges instance:

class(v@granges)
# [1] "GRanges"
# attr(,"package")
# [1] "GenomicRanges"

This unnecessary blows the size of the object so should be avoided.

> sessionInfo()
R version 3.6.0 Patched (2019-05-02 r76454)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS:   /home/hpages/R/R-3.6.r76454/lib/libRblas.so
LAPACK: /home/hpages/R/R-3.6.r76454/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.53.0                  
 [3] rtracklayer_1.45.1                Biostrings_2.53.0                
 [5] XVector_0.25.0                    GenomicRanges_1.37.7             
 [7] GenomeInfoDb_1.21.1               IRanges_2.19.5                   
 [9] S4Vectors_0.23.3                  BiocGenerics_0.31.2              

loaded via a namespace (and not attached):
 [1] zlibbioc_1.31.0             GenomicAlignments_1.21.2   
 [3] BiocParallel_1.19.0         lattice_0.20-38            
 [5] tools_3.6.0                 SummarizedExperiment_1.15.1
 [7] grid_3.6.0                  Biobase_2.45.0             
 [9] matrixStats_0.54.0          Matrix_1.2-17              
[11] GenomeInfoDbData_1.2.1      bitops_1.0-6               
[13] RCurl_1.95-4.12             DelayedArray_0.11.0        
[15] compiler_3.6.0              Rsamtools_2.1.2            
[17] XML_3.98-1.19