Bioconductor / GenomeInfoDb

Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
https://bioconductor.org/packages/GenomeInfoDb
30 stars 13 forks source link

I am having trouble retrieving annotation from hg38 using the GenomeInfoDb package. #84

Closed gnwwanne closed 1 year ago

gnwwanne commented 1 year ago

I am currently working on creating a Signac Pipeline. I keep getting this error when I use the following commands :

extract gene annotations from EnsDb

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)

change to UCSC style since the data was mapped to hg38

seqlevelsStyle(annotations) <- 'UCSC'

Error in stop_if(is.null(NCBI_assembly_info), "\"assembly_accession\" field in 'NCBI_LINKER' must ", : Error in UCSC genome registration file 'hg38.R': "assembly_accession" field in 'NCBI_LINKER' must be associated with a registered NCBI assembly

I updated my genomeInfoDb bio conductor package to the newest version but I am still receiving this error.

##############################################R-code and session info################################ library(Signac) library(Seurat) library(GenomeInfoDb) library(EnsDb.Hsapiens.v86) library(ggplot2) library(patchwork) set.seed(1234)

seurat_p function

seurat_p <- function(filtered_data){ filtered_data<-gene.vs.molecule.cell.filter(filtered_data, min.cell.size=500) seurat_p <- CreateSeuratObject(counts = filtered_data, min.cells = 3, min.features = 400) seurat_p[["percent.mt"]] <- PercentageFeatureSet(seurat_p, pattern = "^MT-") seurat_p <- seurat_p[, seurat_p[["nFeature_RNA"]] > 400 & seurat_p[["nFeature_RNA"]] < 7500 & seurat_p[["percent.mt"]] < 20 ] return(seurat_p) }

implement Signac Pipeline on sample

counts <- Read10X_h5(filename = "filtered_feature_bc_matrix.h5")

counts_pks<-counts$Peaks counts_gene<-counts$Gene Expression

create chromatin assay with this function

chrom_assay <- CreateChromatinAssay( counts = counts_pks, sep = c(":", "-"), fragments = 'atac_fragments.tsv.gz', min.cells = 10, min.features = 200 )

create a seurat object after implementing Signac pipeline

K_602A <- CreateSeuratObject( counts = chrom_assay, assay = "peaks", )

extract gene annotations from EnsDb

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)

change to UCSC style since the data was mapped to hg38

seqlevelsStyle(annotations) <- 'UCSC'

add the gene information to the object

Annotation(KPMP_602A) <- annotations

sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] BiocManager_1.30.19 patchwork_1.1.2
[3] ggplot2_3.4.0 EnsDb.Hsapiens.v86_2.99.0 [5] ensembldb_2.22.0 AnnotationFilter_1.22.0
[7] GenomicFeatures_1.50.4 AnnotationDbi_1.60.0
[9] Biobase_2.58.0 GenomicRanges_1.50.2
[11] GenomeInfoDb_1.35.15 IRanges_2.32.0
[13] S4Vectors_0.36.1 BiocGenerics_0.44.0
[15] SeuratObject_4.1.3 Seurat_4.3.0
[17] Signac_1.9.0

loaded via a namespace (and not attached): [1] utf8_1.2.3 spatstat.explore_3.0-6
[3] reticulate_1.28 tidyselect_1.2.0
[5] RSQLite_2.2.20 htmlwidgets_1.6.1
[7] grid_4.2.2 BiocParallel_1.32.5
[9] Rtsne_0.16 munsell_0.5.0
[11] codetools_0.2-19 ica_1.0-3
[13] interp_1.1-3 future_1.31.0
[15] miniUI_0.1.1.1 withr_2.5.0
[17] spatstat.random_3.1-3 colorspace_2.1-0
[19] progressr_0.13.0 filelock_1.0.2
[21] knitr_1.42 rstudioapi_0.14
[23] ROCR_1.0-11 tensor_1.5
[25] listenv_0.9.0 MatrixGenerics_1.10.0
[27] GenomeInfoDbData_1.2.9 polyclip_1.10-4
[29] bit64_4.0.5 rprojroot_2.0.3
[31] parallelly_1.34.0 vctrs_0.5.2
[33] generics_0.1.3 xfun_0.37
[35] biovizBase_1.46.0 BiocFileCache_2.6.0
[37] R6_2.5.1 hdf5r_1.3.8
[39] bitops_1.0-7 spatstat.utils_3.0-1
[41] cachem_1.0.6 DelayedArray_0.24.0
[43] assertthat_0.2.1 promises_1.2.0.1
[45] BiocIO_1.8.0 scales_1.2.1
[47] nnet_7.3-18 gtable_0.3.1
[49] globals_0.16.2 processx_3.8.0
[51] goftest_1.2-3 rlang_1.0.6
[53] RcppRoll_0.3.0 splines_4.2.2
[55] rtracklayer_1.58.0 lazyeval_0.2.2
[57] dichromat_2.0-0.1 checkmate_2.1.0
[59] spatstat.geom_3.0-6 yaml_2.3.7
[61] reshape2_1.4.4 abind_1.4-5
[63] backports_1.4.1 httpuv_1.6.8
[65] Hmisc_4.7-2 tools_4.2.2
[67] ellipsis_0.3.2 RColorBrewer_1.1-3
[69] ggridges_0.5.4 Rcpp_1.0.10
[71] plyr_1.8.8 base64enc_0.1-3
[73] progress_1.2.2 zlibbioc_1.44.0
[75] purrr_1.0.1 RCurl_1.98-1.10
[77] ps_1.7.2 prettyunits_1.1.1
[79] rpart_4.1.19 deldir_1.0-6
[81] pbapply_1.7-0 cowplot_1.1.1
[83] zoo_1.8-11 SummarizedExperiment_1.28.0 [85] ggrepel_0.9.2 cluster_2.1.4
[87] magrittr_2.0.3 data.table_1.14.6
[89] scattermore_0.8 lmtest_0.9-40
[91] RANN_2.6.1 ProtGenerics_1.30.0
[93] fitdistrplus_1.1-8 matrixStats_0.63.0
[95] hms_1.1.2 mime_0.12
[97] xtable_1.8-4 XML_3.99-0.13
[99] jpeg_0.1-10 gridExtra_2.3
[101] compiler_4.2.2 biomaRt_2.54.0
[103] tibble_3.1.8 KernSmooth_2.23-20
[105] crayon_1.5.2 htmltools_0.5.4
[107] later_1.3.0 Formula_1.2-4
[109] tidyr_1.3.0 DBI_1.1.3
[111] dbplyr_2.3.0 MASS_7.3-58.2
[113] rappdirs_0.3.3 Matrix_1.5-3
[115] cli_3.6.0 parallel_4.2.2
[117] igraph_1.3.5 pkgconfig_2.0.3
[119] GenomicAlignments_1.34.0 foreign_0.8-84
[121] sp_1.6-0 plotly_4.10.1
[123] spatstat.sparse_3.0-0 xml2_1.3.3
[125] XVector_0.38.0 VariantAnnotation_1.44.0
[127] stringr_1.5.0 callr_3.7.3
[129] digest_0.6.31 sctransform_0.3.5
[131] RcppAnnoy_0.0.20 spatstat.data_3.0-0
[133] Biostrings_2.66.0 leiden_0.4.3
[135] fastmatch_1.1-3 htmlTable_2.4.1
[137] uwot_0.1.14 restfulr_0.0.15
[139] curl_5.0.0 shiny_1.7.4
[141] Rsamtools_2.14.0 rjson_0.2.21
[143] lifecycle_1.0.3 nlme_3.1-162
[145] jsonlite_1.8.4 BSgenome_1.66.2
[147] desc_1.4.2 viridisLite_0.4.1
[149] fansi_1.0.4 pillar_1.8.1
[151] lattice_0.20-45 pkgbuild_1.4.0
[153] KEGGREST_1.38.0 fastmap_1.1.0
[155] httr_1.4.4 survival_3.5-0
[157] glue_1.6.2 remotes_2.4.2
[159] png_0.1-8 bit_4.0.5
[161] stringi_1.7.12 blob_1.2.3
[163] latticeExtra_0.6-30 memoise_2.0.1
[165] dplyr_1.1.0 irlba_2.3.5.1
[167] future.apply_1.10.0

hpages commented 1 year ago

Please take a look at https://github.com/Bioconductor/GenomeInfoDb/issues/82#issuecomment-1413895055

gnwwanne commented 1 year ago

Ok thanks but now I am getting this error:

seqlevelsStyle(annotations) <- 'UCSC' Error in function (type, msg, asError = TRUE) : Could not resolve host: ftp.ncbi.nlm.nih.gov

hpages commented 1 year ago

You have an internet problem or ftp.ncbi.nlm.nih.gov was temporarily unresponsive. This has not much to do with the latest changes in GenomeInfoDb.

Does this link work for you? https://ftp.ncbi.nlm.nih.gov/

If it does then try seqlevelsStyle(annotations) <- 'UCSC' again.