Bioconductor / GenomeInfoDb

Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
https://bioconductor.org/packages/GenomeInfoDb
30 stars 14 forks source link

renaming Seqlevels of a forged BSgenome to match an annotation file #108

Closed hannahvanm closed 6 months ago

hannahvanm commented 6 months ago

Hi there,

I'm trying to get the chromosomes of a forged BSgenome object from NCBI (kPetMar1.pri) to match the seqnames of a peaks GRanges object for use with Signac LinkPeaks. The peak's annotation file is the same genome assembly but uses a GTF not from NCBI.

Screenshot 2024-03-04 at 11 13 01 AM Screenshot 2024-03-04 at 11 13 12 AM

First, I tried to change the seqlevelstyle of the the forged genome to UCSC, but that didn't work.

seqlevelsStyle(BSgenome.Pmarinus.NCBI.kPetMar1.pri) <- "UCSC"
Warning: cannot switch kPetMar1.pri's seqlevels to UCSC style

Now I'm trying to use renameSeqlevels to do this manually, using the seqnames from the peaks object as a df. But here's the error renameSeqlevels(BSgenome.Pmarinus.NCBI.kPetMar1.pri, da_peaks_seqnames)

Error in getSeqlevelsReplacementMode(value, seqlevels(x)) : 
  the supplied 'seqlevels' must be a character vector with no NAs and no
  duplicates

So I'm curious about what the format of value should be in renameSeqlevels to correctly change all of the BSgenome chr names to that of the annotation file. Thank you so much for all of your help so far working with this package!

multiome_session_info.txt

hpages commented 6 months ago

Questions about usage of Bioconductor software are best asked on our support site.

But before that, make sure to check the man page for renameSeqlevels (with ?renameSeqlevels). It says:

Usage:
     ...
     renameSeqlevels(x, value)
     ...

Arguments:

       x: Any object having a Seqinfo class in which the seqlevels will
          be kept, dropped or renamed.

   value: A named or unnamed character vector.

          ...

          In the case of ‘renameSeqlevels’, the names are used to map
          new sequence levels to the old (names correspond to the old
          levels). When ‘value’ is unnamed, the replacement vector must
          the same length and in the same order as the original
          ‘seqlevels(x)’.

Looks like it answers your question :wink:

hannahvanm commented 6 months ago

Got it, thank you very much, it does answer my question. I suppose I was trying to finagle a way to get a list of the seqlevels from the annotation and use that as input, as there are many non-standard chromosome names in this genome and doing it manually would be a pill.

Unfortunately though, because the NCBI and annotations don't match in the number of unplaced and unnamed scaffolds, I had to cut down to only standard chr(1-85) for both the BSgenome and annotation file...

I'll post a follow-up about that on the support site (sorry in advance). See you there and thanks for all of your help!