Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
https://bioconductor.org/packages/BSgenome
9 stars 8 forks source link

should matchPattern propagate subject genome? #13

Closed vjcitn closed 3 years ago

vjcitn commented 4 years ago
> yeastg
Yeast genome:
# organism: Saccharomyces cerevisiae (Yeast)
# genome: sacCer3
# provider: UCSC
# release date: April 2011
# 17 sequences:
#   chrI    chrII   chrIII  chrIV   chrV    chrVI   chrVII  chrVIII chrIX  
#   chrX    chrXI   chrXII  chrXIII chrXIV  chrXV   chrXVI  chrM           
# (use 'seqnames()' to see all the sequence names, use the '$' or '[[' operator
# to access a given sequence)
> genome(yeastg)
     chrI     chrII    chrIII     chrIV      chrV     chrVI    chrVII   chrVIII 
"sacCer3" "sacCer3" "sacCer3" "sacCer3" "sacCer3" "sacCer3" "sacCer3" "sacCer3" 
    chrIX      chrX     chrXI    chrXII   chrXIII    chrXIV     chrXV    chrXVI 
"sacCer3" "sacCer3" "sacCer3" "sacCer3" "sacCer3" "sacCer3" "sacCer3" "sacCer3" 
     chrM 
"sacCer3" 
> vmatchPattern("ATG", yeastg)
GRanges object with 444764 ranges and 0 metadata columns:
           seqnames      ranges strand
              <Rle>   <IRanges>  <Rle>
       [1]     chrI     283-285      +
       [2]     chrI     335-337      +
       [3]     chrI     388-390      +
       [4]     chrI     436-438      +
       [5]     chrI     492-494      +
       ...      ...         ...    ...
  [444760]     chrM 85069-85071      -
  [444761]     chrM 85322-85324      -
  [444762]     chrM 85470-85472      -
  [444763]     chrM 85681-85683      -
  [444764]     chrM 85776-85778      -
  -------
  seqinfo: 17 sequences from an unspecified genome

I think it makes sense to propagate the genome and perhaps the seqinfo.

LiNk-NY commented 4 years ago

Thanks Vince, @vjcitn! We'll wait for Hervé's @hpages feedback on the PR.