Open JHYSiu opened 3 months ago
Hey @JHYSiu,
Thanks for reporting this. It would be useful if you could paste the exact code that causes this issue. In particular, are you overriding the default configuration when calling open_soma
?
Sorry, here it is.
census <- open_soma(census_version = "2024-07-01")
cell_metadata <- census$get("census_data")$get("homo_sapiens")$get("obs")
Thanks, I can reproduce the issue. I filed a ticket in the tiledbsoma repo: https://github.com/single-cell-data/TileDB-SOMA/issues/2899
Note that this issue seems to only appear if you try to inspect the SOMACollection object, but you should still be able to query it and convert it to (e.g.) an R dataframe without encountering this issue, as shown in the tutorial.
As a workaround -- maybe not acceptable, but as an option -- you can set environment variables.
export AWS_ACCESS_KEY_ID="whatever"
export AWS_SECRET_ACCESS_KEY="whatever"
export AWS_DEFAULT_REGION="us-west-2"
@johnkerl @ebezzi do you know if these are related to the current CI failures for R? It looks like the same error, but I'm not sure it it being the same cause makes sense.
https://github.com/chanzuckerberg/cellxgene-census/actions/runs/10409767192/job/28830126754
@ivirshup looks the same to me ... will it be possible t use the workaround of setting the environment variable
export AWS_DEFAULT_REGION="us-west-2"
?
Here are two R notebooks:
The first shows the error, the second shows that it can be worked around with:
Sys.setenv(AWS_DEFAULT_REGION = "us-west-2")
export AWS_DEFAULT_REGION=us-west-2
outside the notebook also works, and is likely what most users will want to do.
census-region-failure.ipynb also shows two us-west-2
configs that are apparently not propagating correctly:
vfs.s3.region = "us-west-2"
in new_SOMATileDBContext_for_census
region = "us-west-2"
in ~/.aws/config
default profile
aws configure get region
returns us-west-2
Seems likely there are bugs somewhere in the stack, causing these to not work (and explicit $AWS_DEFAULT_REGION
to be required).
@ryan-williams I still see this error after setting the environment variable and running:
library(cellxgene.census)
library(SingleCellExperiment)
census = open_soma()
sce_obj = get_single_cell_experiment(
census, "Homo sapiens",
obs_column_names = c("cell_type", "tissue_general", "disease", "sex"),
var_value_filter = "feature_id %in% c('ENSG00000161798', 'ENSG00000188229')",
obs_value_filter = "cell_type == 'B cell' & tissue_general == 'lung' & disease == 'COVID-19'"
)
Here is a CI run that fails: https://github.com/chanzuckerberg/cellxgene-census/actions/runs/10728627156/job/29753546340?pr=1273
@ivirshup in your log I see:
[HTTP Response Code: 403] [Exception: InvalidAccessKeyId]
Seems like a different issue than OP and I were seeing:
[HTTP Response Code: 301] [Exception: PermanentRedirect]
Can you verify that the credentials being used there can run e.g.:
aws s3 ls s3://cellxgene-census-public-us-west-2/cell-census/2024-09-02/soma/census_info/datasets/
Maybe try putting that earlier in the GHA (ideally before the 41min step…), and re-running?
Has there been any update on this issue? I am suffering from a similar issue but immediately from using open_soma()
. Setting the environment variable does not help.
Sys.setenv(AWS_DEFAULT_REGION = #"us-west-2")
census <- open_soma(census_version = "2023-12-15")
Error: [TileDB::Task] Error: Caught std::exception: S3: Error while listing with prefix 's3://cellxgene-census-public-us-west-2/cell-census/2024-09-02/soma/census_info/datasets/' and delimiter '/'[Error Type: 100] [HTTP Response Code: 301] [Exception: PermanentRedirect] [Remote IP: 52.216.215.66] [Request ID: VY0G087PD45KK1G7] [Headers: 'content-type' = 'application/xml' 'date' = 'Wed, 09 Oct 2024 21:04:29 GMT' 'server' = 'AmazonS3' 'transfer-encoding' = 'chunked' 'x-amz-bucket-region' = 'us-west-2' 'x-amz-id-2' = 'S29CB1wdkNKHzmbYNd44B9+jVJQnIFc4biQIy4Y/baXaqqZ/4gIyJwX0YE1w+MX0l8aR5lnxmqU=' 'x-amz-request-id' = 'VY0G087PD45KK1G7'] : Unable to parse ExceptionName: PermanentRedirect Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: AlmaLinux 9.4 (Seafoam Ocelot)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.21.so; LAPACK version 3.9.0
locale:
[1] LC_CTYPE=C.utf8 LC_NUMERIC=C LC_TIME=C.utf8 LC_COLLATE=C.utf8 LC_MONETARY=C.utf8 LC_MESSAGES=C.utf8
[7] LC_PAPER=C.utf8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.utf8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RcppSpdlog_0.0.18 cellxgene.census_1.16.1
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.4.0 later_1.3.2 aws.s3_0.3.21 urltools_1.7.3
[6] tibble_3.2.1 triebeard_0.4.1 polyclip_1.10-7 fastDummies_1.7.4 lifecycle_1.0.4
[11] globals_0.16.3 lattice_0.22-6 MASS_7.3-61 magrittr_2.0.3 limma_3.60.6
[16] plotly_4.10.4 rmarkdown_2.28 yaml_2.3.10 httpuv_1.6.15 Seurat_5.1.0
[21] tiledbsoma_1.14.3 sctransform_0.4.1 spam_2.11-0 sp_2.1-4 spatstat.sparse_3.1-0
[26] reticulate_1.39.0 cowplot_1.1.3 pbapply_1.7-2 DBI_1.2.3 RColorBrewer_1.1-3
[31] abind_1.4-8 zlibbioc_1.50.0 Rtsne_0.17 GenomicRanges_1.56.1 purrr_1.0.2
[36] BiocGenerics_0.50.0 msigdbr_7.5.1 GenomeInfoDbData_1.2.12 IRanges_2.38.1 S4Vectors_0.42.1
[41] ggrepel_0.9.6 irlba_2.3.5.1 listenv_0.9.1 spatstat.utils_3.1-0 goftest_1.2-3
[46] RSpectra_0.16-2 spatstat.random_3.3-2 fitdistrplus_1.2-1 parallelly_1.38.0 leiden_0.4.3.1
[51] codetools_0.2-20 DelayedArray_0.30.1 xml2_1.3.6 tidyselect_1.2.1 UCSC.utils_1.0.0
[56] farver_2.1.2 matrixStats_1.4.1 stats4_4.4.0 base64enc_0.1-3 spatstat.explore_3.3-2
[61] jsonlite_1.8.9 progressr_0.14.0 ggridges_0.5.6 survival_3.7-0 tools_4.4.0
[66] tiledb_0.30.2 ica_1.0-3 Rcpp_1.0.13 glue_1.8.0 spdl_0.0.5
[71] gridExtra_2.3 SparseArray_1.4.8 xfun_0.48 MatrixGenerics_1.16.0 GenomeInfoDb_1.40.1
[76] dplyr_1.1.4 BiocManager_1.30.25 fastmap_1.2.0 fansi_1.0.6 RcppML_0.5.6
[81] digest_0.6.37 R6_2.5.1 mime_0.12 colorspace_2.1-1 scattermore_1.2
[86] tensor_1.5 RcppCCTZ_0.2.12 spatstat.data_3.1-2 RSQLite_2.3.7 utf8_1.2.4
[91] tidyr_1.3.1 generics_0.1.3 data.table_1.16.0 httr_1.4.7 htmlwidgets_1.6.4
[96] S4Arrays_1.4.1 uwot_0.2.2 pkgconfig_2.0.3 gtable_0.3.5 blob_1.2.4
[101] lmtest_0.9-40 SingleCellExperiment_1.26.0 XVector_0.44.0 htmltools_0.5.8.1 dotCall64_1.2
[106] fgsea_1.31.3 SeuratObject_5.0.2 scales_1.3.0 Biobase_2.64.0 png_0.1-8
[111] spatstat.univar_3.0-1 knitr_1.48 rstudioapi_0.16.0 reshape2_1.4.4 nlme_3.1-166
[116] curl_5.2.3 cachem_1.1.0 zoo_1.8-12 stringr_1.5.1 KernSmooth_2.23-24
[121] parallel_4.4.0 miniUI_0.1.1.1 arrow_17.0.0.1 AnnotationDbi_1.66.0 nanotime_0.3.10
[126] pillar_1.9.0 grid_4.4.0 vctrs_0.6.5 nanoarrow_0.5.0.1 RANN_2.6.2
[131] promises_1.3.0 xtable_1.8-4 cluster_2.1.6 evaluate_1.0.0 cli_3.6.3
[136] singlet_0.99.6 compiler_4.4.0 rlang_1.1.4 crayon_1.5.3 future.apply_1.11.2
[141] aws.signature_0.6.0 fs_1.6.4 plyr_1.8.9 stringi_1.8.4 viridisLite_0.4.2
[146] deldir_2.0-4 BiocParallel_1.38.0 assertthat_0.2.1 babelgene_22.9 munsell_0.5.1
[151] Biostrings_2.72.1 lazyeval_0.2.2 spatstat.geom_3.3-3 Matrix_1.7-0 RcppHNSW_0.6.0
[156] patchwork_1.3.0 bit64_4.5.2 future_1.34.0 ggplot2_3.5.1 KEGGREST_1.44.1
[161] statmod_1.5.0 shiny_1.9.1 SummarizedExperiment_1.34.0 ROCR_1.0-11 igraph_2.0.3
[166] memoise_2.0.1 fastmatch_1.1-4 bit_4.5.0
Has there been any update on this issue? I am suffering from a similar issue but immediately from using
open_soma()
. Setting the environment variable does not help.Sys.setenv(AWS_DEFAULT_REGION = #"us-west-2") census <- open_soma(census_version = "2023-12-15")
Error: [TileDB::Task] Error: Caught std::exception: S3: Error while listing with prefix 's3://cellxgene-census-public-us-west-2/cell-census/2024-09-02/soma/census_info/datasets/' and delimiter '/'[Error Type: 100] [HTTP Response Code: 301] [Exception: PermanentRedirect] [Remote IP: 52.216.215.66] [Request ID: VY0G087PD45KK1G7] [Headers: 'content-type' = 'application/xml' 'date' = 'Wed, 09 Oct 2024 21:04:29 GMT' 'server' = 'AmazonS3' 'transfer-encoding' = 'chunked' 'x-amz-bucket-region' = 'us-west-2' 'x-amz-id-2' = 'S29CB1wdkNKHzmbYNd44B9+jVJQnIFc4biQIy4Y/baXaqqZ/4gIyJwX0YE1w+MX0l8aR5lnxmqU=' 'x-amz-request-id' = 'VY0G087PD45KK1G7'] : Unable to parse ExceptionName: PermanentRedirect Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
R version 4.4.0 (2024-04-24) Platform: x86_64-pc-linux-gnu Running under: AlmaLinux 9.4 (Seafoam Ocelot) Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.21.so; LAPACK version 3.9.0 locale: [1] LC_CTYPE=C.utf8 LC_NUMERIC=C LC_TIME=C.utf8 LC_COLLATE=C.utf8 LC_MONETARY=C.utf8 LC_MESSAGES=C.utf8 [7] LC_PAPER=C.utf8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.utf8 LC_IDENTIFICATION=C time zone: UTC tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RcppSpdlog_0.0.18 cellxgene.census_1.16.1 loaded via a namespace (and not attached): [1] RcppAnnoy_0.0.22 splines_4.4.0 later_1.3.2 aws.s3_0.3.21 urltools_1.7.3 [6] tibble_3.2.1 triebeard_0.4.1 polyclip_1.10-7 fastDummies_1.7.4 lifecycle_1.0.4 [11] globals_0.16.3 lattice_0.22-6 MASS_7.3-61 magrittr_2.0.3 limma_3.60.6 [16] plotly_4.10.4 rmarkdown_2.28 yaml_2.3.10 httpuv_1.6.15 Seurat_5.1.0 [21] tiledbsoma_1.14.3 sctransform_0.4.1 spam_2.11-0 sp_2.1-4 spatstat.sparse_3.1-0 [26] reticulate_1.39.0 cowplot_1.1.3 pbapply_1.7-2 DBI_1.2.3 RColorBrewer_1.1-3 [31] abind_1.4-8 zlibbioc_1.50.0 Rtsne_0.17 GenomicRanges_1.56.1 purrr_1.0.2 [36] BiocGenerics_0.50.0 msigdbr_7.5.1 GenomeInfoDbData_1.2.12 IRanges_2.38.1 S4Vectors_0.42.1 [41] ggrepel_0.9.6 irlba_2.3.5.1 listenv_0.9.1 spatstat.utils_3.1-0 goftest_1.2-3 [46] RSpectra_0.16-2 spatstat.random_3.3-2 fitdistrplus_1.2-1 parallelly_1.38.0 leiden_0.4.3.1 [51] codetools_0.2-20 DelayedArray_0.30.1 xml2_1.3.6 tidyselect_1.2.1 UCSC.utils_1.0.0 [56] farver_2.1.2 matrixStats_1.4.1 stats4_4.4.0 base64enc_0.1-3 spatstat.explore_3.3-2 [61] jsonlite_1.8.9 progressr_0.14.0 ggridges_0.5.6 survival_3.7-0 tools_4.4.0 [66] tiledb_0.30.2 ica_1.0-3 Rcpp_1.0.13 glue_1.8.0 spdl_0.0.5 [71] gridExtra_2.3 SparseArray_1.4.8 xfun_0.48 MatrixGenerics_1.16.0 GenomeInfoDb_1.40.1 [76] dplyr_1.1.4 BiocManager_1.30.25 fastmap_1.2.0 fansi_1.0.6 RcppML_0.5.6 [81] digest_0.6.37 R6_2.5.1 mime_0.12 colorspace_2.1-1 scattermore_1.2 [86] tensor_1.5 RcppCCTZ_0.2.12 spatstat.data_3.1-2 RSQLite_2.3.7 utf8_1.2.4 [91] tidyr_1.3.1 generics_0.1.3 data.table_1.16.0 httr_1.4.7 htmlwidgets_1.6.4 [96] S4Arrays_1.4.1 uwot_0.2.2 pkgconfig_2.0.3 gtable_0.3.5 blob_1.2.4 [101] lmtest_0.9-40 SingleCellExperiment_1.26.0 XVector_0.44.0 htmltools_0.5.8.1 dotCall64_1.2 [106] fgsea_1.31.3 SeuratObject_5.0.2 scales_1.3.0 Biobase_2.64.0 png_0.1-8 [111] spatstat.univar_3.0-1 knitr_1.48 rstudioapi_0.16.0 reshape2_1.4.4 nlme_3.1-166 [116] curl_5.2.3 cachem_1.1.0 zoo_1.8-12 stringr_1.5.1 KernSmooth_2.23-24 [121] parallel_4.4.0 miniUI_0.1.1.1 arrow_17.0.0.1 AnnotationDbi_1.66.0 nanotime_0.3.10 [126] pillar_1.9.0 grid_4.4.0 vctrs_0.6.5 nanoarrow_0.5.0.1 RANN_2.6.2 [131] promises_1.3.0 xtable_1.8-4 cluster_2.1.6 evaluate_1.0.0 cli_3.6.3 [136] singlet_0.99.6 compiler_4.4.0 rlang_1.1.4 crayon_1.5.3 future.apply_1.11.2 [141] aws.signature_0.6.0 fs_1.6.4 plyr_1.8.9 stringi_1.8.4 viridisLite_0.4.2 [146] deldir_2.0-4 BiocParallel_1.38.0 assertthat_0.2.1 babelgene_22.9 munsell_0.5.1 [151] Biostrings_2.72.1 lazyeval_0.2.2 spatstat.geom_3.3-3 Matrix_1.7-0 RcppHNSW_0.6.0 [156] patchwork_1.3.0 bit64_4.5.2 future_1.34.0 ggplot2_3.5.1 KEGGREST_1.44.1 [161] statmod_1.5.0 shiny_1.9.1 SummarizedExperiment_1.34.0 ROCR_1.0-11 igraph_2.0.3 [166] memoise_2.0.1 fastmatch_1.1-4 bit_4.5.0
I think this post was scary enough that everything started working properly now.
census <- open_soma(census_version = "2023-12-15")
I have this exact issue and none of the solutions here have worked. Has anyone figured out how to fix this?
@dtatarak thank you and our apologies.
The underlying is being actively worked. It turns out to have been more subtle than I had anticipated.
The workaround I've used successfully in the interim is:
Sys.setenv(TILEDB_VFS_S3_REGION = "us-west-2")
Sys.setenv(TILEDB_VFS_S3_NO_SIGN_REQUEST = "true")
@dtatarak thank you and our apologies.
The underlying is being actively worked. It turns out to have been more subtle than I had anticipated.
The workaround I've used successfully in the interim is:
Sys.setenv(TILEDB_VFS_S3_REGION = "us-west-2") Sys.setenv(TILEDB_VFS_S3_NO_SIGN_REQUEST = "true")
@johnkerl
Thanks for the quick reply. Sadly that workaround does not fix the problem on my end.
Sys.setenv(TILEDB_VFS_S3_REGION = "us-west-2")
Sys.setenv(TILEDB_VFS_S3_NO_SIGN_REQUEST = "true")
census <- open_soma(census_version = "2023-12-15")
Still get this error:
Error: [TileDB::Task] Error: Caught std::exception: S3: Error while listing with prefix 's3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/' and delimiter '/'[Error Type: 100] [HTTP Response Code: 301] [Exception: PermanentRedirect] [Remote IP: 52.216.186.62] [Request ID: E5Q954P5JJWD68ZM] [Headers: 'content-type' = 'application/xml' 'date' = 'Wed, 23 Oct 2024 15:04:22 GMT' 'server' = 'AmazonS3' 'transfer-encoding' = 'chunked' 'x-amz-bucket-region' = 'us-west-2' 'x-amz-id-2' = 'qFZu73Vv/Nd/P1Kswi9n4N+0O1//6W1W9D08PtHi3dOiO/HpxiB22o28i1K1YPeM/QfT+nbEC5c=' 'x-amz-request-id' = 'E5Q954P5JJWD68ZM'] : Unable to parse ExceptionName: PermanentRedirect Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
Thanks for your patience @dtatarak . It's puzzling me that this workaround suffices for me but not you. :|
Can I trouble you to try one more thing?
Sys.setenv(TILEDB_VFS_S3_REGION = "us-west-2")
Sys.setenv(AWS_DEFAULT_REGION = "us-west-2")
Sys.setenv(TILEDB_VFS_S3_NO_SIGN_REQUEST = "true")
Also: can you try this after quitting and restarting R, and before doing library(tiledbsoma)
and/or library(cellxgene.census)
?
@johnkerl
Ah! Success! I had tried both of those workarounds separately but never together. Thank you very much!
Fantastic, thank you @dtatarak ! We are working on the real fix as a priority but I'm happy to hear you have a workaround in the interim.
When trying to access cell-census, I can open_soma fine but then when I try to explore or pull data following the tutorial, I get this error:
(It's similar to the issue described here I think: https://github.com/chanzuckerberg/cellxgene-census/issues/908)
Matrix products: default BLAS/LAPACK: /gpfs3/apps/eb/2020b/skylake/software/FlexiBLAS/3.2.0-GCC-11.3.0/lib64/libflexiblas.so.3.2
locale: [1] C
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages: [1] RcppSpdlog_0.0.17 qs_0.26.3
[3] cellxgene.census_1.15.0 SingleCellExperiment_1.20.1 [5] SummarizedExperiment_1.28.0 Biobase_2.58.0
[7] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[9] IRanges_2.32.0 S4Vectors_0.36.2
[11] BiocGenerics_0.44.0 MatrixGenerics_1.10.0
[13] matrixStats_1.3.0
loaded via a namespace (and not attached): [1] zoo_1.8-12 tidyselect_1.2.1 purrr_1.0.2
[4] lattice_0.20-45 vctrs_0.6.5 generics_0.1.3
[7] base64enc_0.1-3 utf8_1.2.4 rlang_1.1.4
[10] pillar_1.9.0 glue_1.7.0 aws.s3_0.3.21
[13] bit64_4.0.5 GenomeInfoDbData_1.2.9 lifecycle_1.0.4
[16] zlibbioc_1.44.0 stringfish_0.16.0 RApiSerialize_0.1.3
[19] arrow_16.1.0 tiledbsoma_1.13.0 nanoarrow_0.5.0.1
[22] curl_5.2.1 fansi_1.0.6 urltools_1.7.3
[25] triebeard_0.4.1 Rcpp_1.0.13 DelayedArray_0.24.0
[28] tiledb_0.29.0 RcppCCTZ_0.2.12 RcppParallel_5.1.8
[31] jsonlite_1.8.8 XVector_0.38.0 fs_1.6.4
[34] bit_4.0.5 digest_0.6.36 dplyr_1.1.4
[37] grid_4.2.1 cli_3.6.3 tools_4.2.1
[40] bitops_1.0-8 magrittr_2.0.3 RCurl_1.98-1.16
[43] tibble_3.2.1 spdl_0.0.5 aws.signature_0.6.0
[46] pkgconfig_2.0.3 Matrix_1.6-4 data.table_1.15.4
[49] xml2_1.3.6 nanotime_0.3.9 assertthat_0.2.1
[52] httr_1.4.7 R6_2.5.1 compiler_4.2.1