Bioconductor / GenomeInfoDb

Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
https://bioconductor.org/packages/GenomeInfoDb
30 stars 14 forks source link

getChromInfoFromUCSC: anyNA(m32) is not TRUE error for hg38 #82

Closed chrarnold closed 1 year ago

chrarnold commented 1 year ago

Today, I receive the following error with hg38 specifcally, hg19 and mm10 work as before. Any idea what is causing this? Must be a change in the last few days, it worked always before up until Friday when I last tested.

> GenomeInfoDb::getChromInfoFromUCSC("hg38")
Error in .order_seqlevels(chrom_sizes[, "chrom"]) :
!anyNA(m32) is not TRUE

GenomeInfoDb version: '1.34.3'

mxw010 commented 1 year ago

based on the last fix, ucsc changed their stuff which caused the error. I ran into the same error with hg38 today with another package (chipPeakanno), so it's mostly likely some outside changes that caused the error.

chrarnold commented 1 year ago

I see ok! Is there a place where this can be reported? Doesnt seem to be associated with any single package. I currently cant build my package because of that, will think about a backup solution until this is fixed.

mxw010 commented 1 year ago

I googled a bit more, and the last time the bug appeared, chipPeakAnno author suggested updating GenomeInfoDb. I think they used this package as well. I think attaching an output from sessionInfo() would be helpful to the authors of this package to debug.

I think this is likely a bug from outside because I did not have any problem running it last week. This bug re-appeared today.

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.4 (Ootpa)

Matrix products: default
BLAS:   /gpfs0/export/apps/opt/R/4.1.0-foss-2020a/lib64/R/lib/libRblas.so
LAPACK: /gpfs0/export/apps/opt/R/4.1.0-foss-2020a/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] org.Hs.eg.db_3.14.0                     
 [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.14.0
 [3] EnsDb.Hsapiens.v86_2.99.0               
 [4] ensembldb_2.18.4                        
 [5] AnnotationFilter_1.18.0                 
 [6] GenomicFeatures_1.46.5                  
 [7] AnnotationDbi_1.56.2                    
 [8] Biobase_2.54.0                          
 [9] ChIPpeakAnno_3.28.1                     
[10] GenomicRanges_1.46.1                    
[11] GenomeInfoDb_1.30.1                     
[12] IRanges_2.28.0                          
[13] S4Vectors_0.32.4                        
[14] BiocGenerics_0.40.0                     

loaded via a namespace (and not attached):
  [1] BiocFileCache_2.2.1         plyr_1.8.6                 
  [3] lazyeval_0.2.2              splines_4.1.0              
  [5] BiocParallel_1.28.3         ggplot2_3.3.6              
  [7] amap_0.8-18                 digest_0.6.27              
  [9] invgamma_1.1                htmltools_0.5.2            
 [11] SQUAREM_2021.1              fansi_0.5.0                
 [13] magrittr_2.0.2              memoise_2.0.0              
 [15] BSgenome_1.62.0             InteractionSet_1.22.0      
 [17] limma_3.50.1                Biostrings_2.62.0          
 [19] matrixStats_0.61.0          systemPipeR_2.0.6          
 [21] bdsmatrix_1.3-4             prettyunits_1.1.1          
 [23] jpeg_0.1-9                  colorspace_2.0-2           
 [25] blob_1.2.2                  rappdirs_0.3.3             
 [27] apeglm_1.16.0               ggrepel_0.9.1              
 [29] dplyr_1.0.8                 crayon_1.4.1               
 [31] RCurl_1.98-1.6              graph_1.72.0               
 [33] survival_3.2-11             glue_1.6.2                 
 [35] gtable_0.3.0                zlibbioc_1.40.0            
 [37] XVector_0.34.0              DelayedArray_0.20.0        
 [39] scales_1.2.1                futile.options_1.0.1       
 [41] mvtnorm_1.1-3               DBI_1.1.2                  
 [43] Rcpp_1.0.7                  progress_1.2.2             
 [45] emdbook_1.3.12              bit_4.0.4                  
 [47] truncnorm_1.0-8             htmlwidgets_1.5.4          
 [49] httr_1.4.2                  gplots_3.1.1               
 [51] RColorBrewer_1.1-2          ellipsis_0.3.2             
 [53] pkgconfig_2.0.3             XML_3.99-0.9               
 [55] farver_2.1.0                dbplyr_2.1.1               
 [57] locfit_1.5-9.5              utf8_1.2.2                 
 [59] tidyselect_1.1.1            labeling_0.4.2             
 [61] rlang_1.0.6                 munsell_0.5.0              
 [63] tools_4.1.0                 cachem_1.0.5               
 [65] cli_3.6.0                   generics_0.1.0             
 [67] RSQLite_2.2.11              stringr_1.4.0              
 [69] fastmap_1.1.0               yaml_2.2.1                 
 [71] bit64_4.0.5                 caTools_1.18.2             
 [73] purrr_0.3.4                 KEGGREST_1.34.0            
 [75] RBGL_1.70.0                 formatR_1.11               
 [77] xml2_1.3.2                  biomaRt_2.50.3             
 [79] compiler_4.1.0              rstudioapi_0.13            
 [81] filelock_1.0.2              curl_4.3.2                 
 [83] png_0.1-7                   tibble_3.1.6               
 [85] stringi_1.7.3               futile.logger_1.4.3        
 [87] lattice_0.20-44             ProtGenerics_1.26.0        
 [89] Matrix_1.3-3                multtest_2.50.0            
 [91] vctrs_0.5.2                 pillar_1.6.2               
 [93] lifecycle_1.0.3             DiffBind_3.6.5             
 [95] BiocManager_1.30.16         irlba_2.3.5                
 [97] bitops_1.0-7                rtracklayer_1.54.0         
 [99] R6_2.5.0                    BiocIO_1.4.0               
[101] latticeExtra_0.6-29         hwriter_1.3.2              
[103] ShortRead_1.52.0            KernSmooth_2.23-20         
[105] lambda.r_1.2.4              MASS_7.3-54                
[107] gtools_3.9.2                assertthat_0.2.1           
[109] SummarizedExperiment_1.24.0 rjson_0.2.21               
[111] regioneR_1.26.1             GenomicAlignments_1.30.0   
[113] Rsamtools_2.10.0            GenomeInfoDbData_1.2.7     
[115] parallel_4.1.0              hms_1.1.1                  
[117] VennDiagram_1.7.3           grid_4.1.0                 
[119] coda_0.19-4                 GreyListChIP_1.26.0        
[121] MatrixGenerics_1.6.0        ashr_2.2-54                
[123] mixsqp_0.3-43               bbmle_1.0.24               
[125] numDeriv_2016.8-1.1         restfulr_0.0.13
chrarnold commented 1 year ago

Same here, worked Friday and no changes, worked all the time before... hopefully not difficult to fix

Shts2123 commented 1 year ago

same issue here. It was working last week but now I'm getting the same error.

hpages commented 1 year ago

So it looks like UCSC has just sneakily changed the assembly that hg38 is based on, again! Used to be GRCh38.p13, now it's GRCh38.p14. I've not seen any announcement. Unfortunately this breaks GenomeInfoDb::getChromInfoFromUCSC("hg38").

Not the first time they do this: see issue #30. They've even done a change to hg19 about 3 years ago: see issue #9 :cry:

A fix is on the way. Will only be available in current BioC release (3.16) and devel (3.17) because past releases are frozen and no longer maintained. So, if you are using a past release (@mxw010 you seem to be in that situation), you'll need to update to the current release.

priyatamapandey commented 1 year ago

I ran the annotatr package for hg38 and it returned same error. I don't understand, how can I fixed it.

Thank you for any suggestion. Priya

hpages commented 1 year ago

Fixed in GenomeInfoDb 1.34.8 (release) and 1.35.14 (devel). See commit 091b5d2e8ef23ae5bbe915a33d62f66f4ad32585

Note that these new versions will take a couple of days to propagate to the Bioconductor repositories and become available via BiocManager::install(). Again, only users of BioC 3.16 or 3.17 will be able to update. If you cannot wait, you can install directly from GitHub (with BiocManager::install("Bioconductor/GenomeInfoDb)") but please be aware that you should never do this in normal times, so this should be a one-time exception.

Best, H.

chrarnold commented 1 year ago

Fixed in GenomeInfoDb 1.34.8 (release) and 1.35.14 (devel). See commit 091b5d2

Note that these new versions will take a couple of days to propagate to the Bioconductor repositories and become available via BiocManager::install(). Again, only users of BioC 3.16 or 3.17 will be able to update. If you cannot wait, you can install directly from GitHub (with BiocManager::install("Bioconductor/GenomeInfoDb)") but please be aware that you should never do this in normal times, so this should be a one-time exception.

Best, H.

Thanks Hervé! My package requires this to work, which also means that with the (permanent I guess) change at UCSC, all people who do not use the current Bioc version (i.e., those who use 3.15 or below) wont be able to use the package anymore, as the error will appear for all versions below 1.34.8 correct? I may need to include the GenomeInfoDb dependency to 1.34.8 at least explicitly in my DESCRIPTIONfrom now onwards, for example.

hpages commented 1 year ago

@chrarnold

all people who do not use the current Bioc version (i.e., those who use 3.15 or below) wont be able to use the package anymore

That's the case, unfortunately.

I may need to include the GenomeInfoDb dependency to 1.34.8 at least explicitly in my DESCRIPTION from now onwards, for example.

Yep, that's a good idea. You can include the GenomeInfoDb (>= 1.34.8) dep in the RELEASE_3_16 branch of your package and the GenomeInfoDb (>= 1.35.14) in its devel branch.

H.

priyatamapandey commented 1 year ago

HI, Thank you for explaining the version but I don't see 1.34.8 for GenomeInfoDb. https://www.bioconductor.org/packages/release/bioc/html/GenomeInfoDb.html. This is 1.34.7. Also, with the latest biocManager, I am not able to install the annotatr package.

Thank you for your help Priya

mjsteinbaugh commented 1 year ago

@priyatamapandey See @hpages response above:

Note that these new versions will take a couple of days to propagate to the Bioconductor repositories and become available

hpages commented 1 year ago

Finally announced today (7 min ago): https://groups.google.com/a/soe.ucsc.edu/g/genome-announce/c/SytF4qkgpMw

No word about the 2 extra sequences that they kept from GRCh38.p13 and that don't belong to GRCh38.p14. See my long commit comment for the details: 091b5d2e8ef23ae5bbe915a33d62f66f4ad32585

miachom commented 1 year ago

Hi, I installed directly from GitHub (with BiocManager::install("Bioconductor/GenomeInfoDb)"). But now I am getting error as: Error in stop_if(is.null(NCBI_assembly_info), "\"assembly_accession\" field in 'NCBI_LINKER' must ", : Error in UCSC genome registration file 'hg38.R': "assembly_accession" field in 'NCBI_LINKER' must be associated with a registered NCBI assembly

Could you please help with this? @hpages

chrarnold commented 1 year ago

Hi, I installed directly from GitHub (with BiocManager::install("Bioconductor/GenomeInfoDb)"). But now I am getting error as: Error in stop_if(is.null(NCBI_assembly_info), ""assembly_accession" field in 'NCBI_LINKER' must ", : Error in UCSC genome registration file 'hg38.R': "assembly_accession" field in 'NCBI_LINKER' must be associated with a registered NCBI assembly

Could you please help with this? @hpages

I remember seeing the same, but the message disappeared for me after restarting the session and re-loading the newest package version.

miachom commented 1 year ago

Hi, thanks a lot! It worked after restarting!

zhiliqiao commented 1 year ago

I ran into the same issue, and it turns out I cannot update the GenomeInfoDb package to the required version (1.34.8). I'm using the BiocManager::install("Bioconductor/GenomeInfoDb") command as suggested. My Bioconductor is version 3.16, BiocManager 1.30.19. But what I got from this installation was GenomeInfoDb of version 1.34.4. Removing and reinstalling didn't fix this problem. Can anyone tell me where I did wrong?

hpages commented 1 year ago

@zhiliqiao The latest GenomeInfoDb has finally propagated to the Bioconductor public repositories so there's no need to install it from GitHub. Just install it the normal way i.e. with BiocManager::install("GenomeInfoDb").

AntoniaChroni commented 1 year ago

I did what @hpages suggested but I still get this error:

Error in .order_seqlevels(chrom_sizes[, "chrom"]) : !anyNA(m32) is not TRUE

rm(chrom.assay) # To clean up the generated chrom.assay object within the for loop. Warning message: In rm(chrom.assay) : object 'chrom.assay' not found

Any thoughts, or workarounds on this?

hpages commented 1 year ago

@AntoniaChroni So you are using the latest GenomeInfoDb (1.34.9 in BioC 3.16 and 1.35.15 in BioC 3.17) and you're still getting this error? What's your sessionInfo()? What version of Bioconductor (BiocManager::version()) do you use?

As explained above, the fix is only in GenomeInfoDb >= 1.34.8 (in BioC 3.16) and GenomeInfoDb >= 1.35.14 (in BioC 3.17).