Open Roleren opened 5 years ago
I can't reproduce this :(
library(GenomicRanges)
seqlevelsStyle(GRanges("chrM:1-5"))
# [1] "UCSC"
seqlevelsStyle(GRanges("MT:1-5"))
# [1] "NCBI" "Ensembl"
sessionInfo()
R version 3.6.0 Patched (2019-05-02 r76454)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Matrix products: default
BLAS: /home/hpages/R/R-3.6.r76454/lib/libRblas.so
LAPACK: /home/hpages/R/R-3.6.r76454/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicRanges_1.39.1 GenomeInfoDb_1.22.0 IRanges_2.21.1
[4] S4Vectors_0.24.0 BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] zlibbioc_1.32.0 compiler_3.6.0 XVector_0.26.0
[4] GenomeInfoDbData_1.2.2 RCurl_1.95-4.12 bitops_1.0-6
Basic reproduction is this:
# Normal case works
NCBI <- GRanges(seqnames = c("MT", paste0(1:2)), 1)
seqlevelsStyle(NCBI)
# "NCBI" "Ensembl"
seqlevelsStyle(NCBI) <- "UCSC"
seqlevels(NCBI)
# [1] "chrM" "chr1" "chr2"
# This does not ( This is the case for my bam file)
mixed <- GRanges(seqnames = c("MT", paste0("chr", 1:2)), 1)
seqlevelsStyle(mixed)
# "UCSC"
seqlevelsStyle(mixed) <- "UCSC"
seqlevels(mixed)
# [1] "MT" "chr1" "chr2" (<- It should swap MT for chrM here I think when you say "UCSC")
And this one gives a warning
# One of each (this works)
mixed <- GRanges(seqnames = c("MT", paste0("chr", 1)), 1)
seqlevelsStyle(mixed)
"NCBI" "UCSC" "Ensembl"
seqlevelsStyle(mixed) <- "UCSC"
# Warning message:
# In .replace_seqlevels_style(x_seqlevels, value) :
# found more than one best sequence renaming map compatible with seqname style "UCSC" for # this object, using the first one
seqlevels(mixed)
# [1] "chrM" "chr1"
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /usr/local/lib64/R/lib/libRblas.so
LAPACK: /usr/local/lib64/R/lib/libRlapack.so
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] ORFik_1.7.6 reshape2_1.4.3 rtracklayer_1.46.0
[4] GenomicAlignments_1.22.1 Rsamtools_2.2.1 Biostrings_2.54.0
[7] XVector_0.26.0 SummarizedExperiment_1.16.0 DelayedArray_0.12.0
[10] BiocParallel_1.20.0 matrixStats_0.55.0 GenomicFeatures_1.38.0
[13] AnnotationDbi_1.48.0 Biobase_2.46.0 GenomicRanges_1.38.0
[16] GenomeInfoDb_1.22.0 IRanges_2.20.1 S4Vectors_0.24.0
[19] BiocGenerics_0.32.0 ggplot2_3.2.1 data.table_1.12.6
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-7 RColorBrewer_1.1-2 progress_1.2.2
[5] httr_1.4.1 tools_3.6.0 backports_1.1.5 R6_2.4.1
[9] rpart_4.1-15 Hmisc_4.3-0 DBI_1.0.0 lazyeval_0.2.2
[13] colorspace_1.4-1 nnet_7.3-12 withr_2.1.2 tidyselect_0.2.5
[17] gridExtra_2.3 prettyunits_1.0.2 GGally_1.4.0 DESeq2_1.26.0
[21] bit_1.1-14 curl_4.2 compiler_3.6.0 htmlTable_1.13.2
[25] labeling_0.3 scales_1.1.0 checkmate_1.9.4 genefilter_1.68.0
[29] askpass_1.1 rappdirs_0.3.1 stringr_1.4.0 digest_0.6.23
[33] foreign_0.8-72 base64enc_0.1-3 pkgconfig_2.0.3 htmltools_0.4.0
[37] dbplyr_1.4.2 htmlwidgets_1.5.1 rlang_0.4.2 rstudioapi_0.10
[41] RSQLite_2.1.2 farver_2.0.1 acepack_1.4.1 dplyr_0.8.3
[45] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.2 Formula_1.2-3
[49] Matrix_1.2-18 Rcpp_1.0.3 munsell_0.5.0 lifecycle_0.1.0
[53] stringi_1.4.3 zlibbioc_1.32.0 plyr_1.8.4 BiocFileCache_1.10.2
[57] grid_3.6.0 blob_1.2.0 crayon_1.3.4 lattice_0.20-38
[61] splines_3.6.0 annotate_1.64.0 hms_0.5.2 locfit_1.5-9.1
[65] zeallot_0.1.0 knitr_1.26 pillar_1.4.2 geneplotter_1.64.0
[69] biomaRt_2.42.0 XML_3.98-1.20 glue_1.3.1 latticeExtra_0.6-28
[73] BiocManager_1.30.10 vctrs_0.2.0 gtable_0.3.0 openssl_1.4.1
[77] purrr_0.3.3 reshape_0.8.8 assertthat_0.2.1 xfun_0.11
[81] xtable_1.8-4 survival_3.1-7 tibble_2.1.3 memoise_1.1.0
[85] cluster_2.1.0
So to sum up, this is the case:
seqlevelsStyle support shifting of style when the cases are clear.
When there are a mix of styles with only 2 chromosomes, it switches correctly but gives you a warning.
When there are a mix of styles with > 2 chromosomes, it switches incorrectly. This is the case I would like, and the most logical is that it behaves as the case for 2 chromosomes, giving a warning, but switching correctly.
That is: if : "MT", "chr1" gives -> "chrM" "chr1" then: "MT", "chr1", "chr2" should give -> "chrM" "chr1", "chr2"
Which does not happen now.
I think this is quite a serious issue, I get it all the time warnings like this: Warning messages:
1: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
- in 'x': chrY, chrX
- in 'y': X, Y
This is after they should have had equal seqlevels naming.
So I think a lot of people are losing the sex chromosomes + sometimes mitochondrial, if they use different seqnames and trust that seqlevelsstyle will fix it.
I thanks for a nice function, I use seqlevelsStyle a lot.
One thing has always bothered me, is for the mitocondrial chromosome.
Two objects with same naming convention as these:
Can show seqlevels as this:
So even though they use the same seqlevelsStyle, the objects does not name the mitochondrial chromosome the same. Is there a reason for this ? :)