Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
https://bioconductor.org/packages/BSgenome
9 stars 8 forks source link

inconsistent getSeq and isCircular behavior #17

Open Roleren opened 3 years ago

Roleren commented 3 years ago

Hey,

Referencing discussion on bioconductor support page

So to conclude what is missing are 2 things:

  1. Make sure isCircular works properly for DNAStringSet, as now the isCircular flag is only updated in the metadata and not shown correctly when using isCircular(DNAStringSet) getter.
  2. Make sure getSeq for BSGenome and DNAStringSet both support circular wrapping ranges (that go from end of sequence wrapping to the beginning or reverse)

If you want some help on this I can make a PR, but there is a lot of package dependencies her, with Biostrings, BSgenome, rtracklayer and GenomeInfoDb.

LiNk-NY commented 3 years ago

Hi Hervé @hpages, did you want to look at this? Thank you! Update: Thanks for looking at this Hervé. From your response, it looks like a complicated interaction between several packages. We should layout how we can proceed with making these methods work in harmony.

Roleren commented 3 years ago

So first easy thing we could fix is to make the seqinfo getter to actually use the seqinfo table of the DNAStringSet object, you agree ?

This is easily done by updating the seqinfo(DNAStringSet), to not remake the object, but instead use: metadata(x), if it is defined:

 library(GenomicRanges)
 library(BSgenome.Hsapiens.UCSC.hg19) # Use 38 if you have that instead
 gr <- GRanges("chrMT:16567-16578:+")
 isCircular(gr) <- TRUE
 seqlengths(gr) <- 16569
 # Load sequence
 fa.loaded <- DNAStringSet(BSgenome.Hsapiens.UCSC.hg19$chrMT)

metadata(fa.loaded)
  list() # Output is now empty list

So logic will be, check if metadata() does not return empty list, if it does recreate seqinfo object as before, if it is defined as a valid Seqinfo object, return that directly.

You want me to make a PR for this ? Where should I add tests if needed ?