Bioconductor / GenomeInfoDb

Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
https://bioconductor.org/packages/GenomeInfoDb
30 stars 13 forks source link

New genbank accession prefix `MU` #81

Closed wresch closed 1 year ago

wresch commented 1 year ago

Good evening,

With the current version of GenomeInfoDb (1.34.7 / R 4.2.2) the following code fails:

> x <- getChromInfoFromUCSC("hg19")  # no problem
> x <- getChromInfoFromUCSC("hg38") # fails
Error in .order_seqlevels(chrom_sizes[, "chrom"]) :
  !anyNA(m32) is not TRUE
> traceback()
8: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
7: stopifnot(!anyNA(m32)) at hg38.R#38
6: .order_seqlevels(chrom_sizes[, "chrom"]) at hg38.R#70
5: FETCH_ORDERED_CHROM_SIZES(goldenPath.url = goldenPath.url)
4: .fetch_raw_chrom_info_from_UCSC(GENOME, ASSEMBLED_MOLECULES,
       CIRC_SEQS, FETCH_ORDERED_CHROM_SIZES, goldenPath.url = goldenPath.url)
3: .get_raw_chrom_info_for_registered_UCSC_genome(GENOME, ASSEMBLED_MOLECULES,
       vars$CIRC_SEQS, FETCH_ORDERED_CHROM_SIZES = vars$FETCH_ORDERED_CHROM_SIZES,
       assembled.molecules.only = assembled.molecules.only, goldenPath.url = goldenPath.url,
       recache = recache)
2: .get_chrom_info_for_registered_UCSC_genome(script_path, assembled.molecules.only = assembled.molecules.only,
       map.NCBI = map.NCBI, add.ensembl.col = add.ensembl.col, goldenPath.url = goldenPath.url,
       recache = recache)
1: getChromInfoFromUCSC("hg38")

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/local/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomeInfoDb_1.34.7 IRanges_2.32.0      S4Vectors_0.36.0
[4] BiocGenerics_0.44.0

loaded via a namespace (and not attached):
[1] compiler_4.2.2         magrittr_2.0.3         tools_4.2.2
[4] GenomeInfoDbData_1.2.9 RCurl_1.98-1.9         stringi_1.7.8
[7] stringr_1.4.1          bitops_1.0-7

I traced this back to the GenBankAccn_prefixes in hg38.R missing the MU prefix which now shows up in some fix/patch records. This probably happened in the last couple of days.

hpages commented 1 year ago

Thanks for the report. Please see issue #82. I'll close this one since this is actually a duplicate of issue #82.

Best, H.