Bioconductor / GenomeInfoDbData

GenomeInfoDbData
2 stars 0 forks source link

Arabidopsis thaliana styles not consistent #4

Open lien-brzezniak opened 2 years ago

lien-brzezniak commented 2 years ago

Hello, I found some inconsistency for Arabidopsis thaliana, which can mess up while renaming seqnames: The styles for A. thaliana are defined as:

GenomeInfoDb::genomeStyles("Arabidopsis_thaliana") circular auto sex NCBI TAIR9 Ensembl 1 FALSE TRUE FALSE 1 Chr1 1 2 FALSE TRUE FALSE 2 Chr2 2 3 FALSE TRUE FALSE 3 Chr3 3 4 FALSE TRUE FALSE 4 Chr4 4 5 FALSE TRUE FALSE 5 Chr5 5 6 TRUE FALSE FALSE MT ChrM Mt 7 TRUE FALSE TRUE Pltd ChrC Pt

But when I load a BSgenome package: tair9 <- getBSgenome('BSgenome.Athaliana.TAIR.TAIR9')

I find:

GenomeInfoDb::seqlevelsStyle(tair9) [1] "NCBI" seqnames(tair9) [1] "Chr1" "Chr2" "Chr3" "Chr4" "Chr5" "ChrM" "ChrC"

According to the GenomeInfoDb, the style should be 'TAIR9' and not 'NCBI'. I found that some other packages using BSgenome and GenomeInfoDb to rename names of chromosomes fail to do it for A. thaliana. I hope you find this useful, best regards !