chanzuckerberg / cellxgene-census

CZ CELLxGENE Discover Census
https://chanzuckerberg.github.io/cellxgene-census/
MIT License
72 stars 18 forks source link

Cannot run open_soma() from Europe server #1174

Open Alex2975 opened 4 weeks ago

Alex2975 commented 4 weeks ago

Dear Authors,

Thank you so much for developing this tool. I tried to open_soma() from Europe servers, but I kept getting the following error. If I run open_soma() from US servers, I do not have the following error. Could you please share some insights? I need to have it run on Europe servers.

File "tiledb/libtiledb.pyx", line 3706, in tiledb.libtiledb.object_type File "tiledb/libtiledb.pyx", line 348, in tiledb.libtiledb.check_error File "tiledb/libtiledb.pyx", line 342, in tiledb.libtiledb._raise_ctx_err File "tiledb/libtiledb.pyx", line 327, in tiledb.libtiledb._raise_tiledb_error tiledb.cc.TileDBError: [TileDB::S3] Error: Error while listing with prefix 's3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/__schema/' and delimiter '/'[Error Type: 99] [HTTP Response Code: -1] : curlCode: 28, Timeout was reached

ivirshup commented 4 weeks ago

I just tried to replicate this on an AWS instance running on eu-north-1, but did not see this error. Here's what I did:

mamba create -yn cellxgene-census "python=3.11"
conda activate cellxgene-census
pip install ipython cellxgene-census
ipython
import cellxgene_census
census = cellxgene_census.open_soma()
census
The "stable" release is currently 2023-12-15. Specify 'census_version="2023-12-15"' in future calls to open_soma() to ensure data consistency.

<Collection 's3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/' (open for 'r') (2 items)
    'census_info': 's3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/census_info' (unopened)
    'census_data': 's3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/census_data' (unopened)>

@Alex2975, is this roughly similar to what you did? And is it intermittent? I was able to at least connect to this back when I was in Germany, but that was on an institutional connection.

It would also be great if you could report some library versions here? You can do this by running:

import cellxgene_census, session_info
session_info.show(html=False, dependencies=True)

And paste the output here like:

``` ----- IPython 8.25.0 cellxgene_census 1.14.0 session_info 1.0.0 ----- aiobotocore 2.13.0 aiohttp 3.9.5 aioitertools 0.11.0 aiosignal 1.3.1 anndata 0.10.7 asttokens NA attr 23.2.0 attrs 23.2.0 botocore 1.34.106 certifi 2024.06.02 charset_normalizer 3.3.2 cython_runtime NA dateutil 2.9.0.post0 decorator 5.1.1 executing 2.0.1 frozenlist 1.4.1 fsspec 2024.5.0 h5py 3.11.0 idna 3.7 jedi 0.19.1 jmespath 1.0.1 llvmlite 0.42.0 multidict 6.0.5 natsort 8.4.0 numba 0.59.1 numpy 1.26.4 packaging 24.0 pandas 2.2.2 parso 0.8.4 prompt_toolkit 3.0.45 pure_eval 0.2.2 pyarrow 16.1.0 pyarrow_hotfix NA pygments 2.18.0 pytz 2024.1 requests 2.32.3 s3fs 2024.5.0 scipy 1.13.1 six 1.16.0 somacore 1.0.11 stack_data 0.6.3 tiledb 0.29.0 tiledbsoma 1.11.3 traitlets 5.14.3 typing_extensions NA urllib3 2.2.1 wcwidth 0.2.13 wrapt 1.16.0 yarl 1.9.4 ----- Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] Linux-6.8.0-1008-aws-x86_64-with-glibc2.39 ----- Session information updated at 2024-06-03 17:34 ```
Alex2975 commented 4 weeks ago

Thank you for getting back to me so quickly, @ivirshup . I followed your instructions, and still got the same timeout error.

File "tiledb/libtiledb.pyx", line 3706, in tiledb.libtiledb.object_type File "tiledb/libtiledb.pyx", line 348, in tiledb.libtiledb.check_error File "tiledb/libtiledb.pyx", line 342, in tiledb.libtiledb._raise_ctx_err File "tiledb/libtiledb.pyx", line 327, in tiledb.libtiledb._raise_tiledb_error tiledb.cc.TileDBError: [TileDB::S3] Error: Error while listing with prefix 's3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/__schema/' and delimiter '/'[Error Type: 99] [HTTP Response Code: -1] : curlCode: 28, Timeout was reached

Here is the session info:

session_info.show(html=False, dependencies=True)

cellxgene_census 1.14.0 session_info 1.0.0

aiobotocore 2.13.0 aiohttp 3.9.5 aioitertools 0.11.0 aiosignal 1.3.1 anndata 0.10.7 attr 23.2.0 attrs 23.2.0 botocore 1.34.106 certifi 2024.06.02 charset_normalizer 3.3.2 cython_runtime NA dateutil 2.9.0.post0 frozenlist 1.4.1 fsspec 2024.5.0 h5py 3.11.0 idna 3.7 jmespath 1.0.1 llvmlite 0.42.0 multidict 6.0.5 natsort 8.4.0 numba 0.59.1 numpy 1.26.4 packaging 24.0 pandas 2.2.2 pyarrow 16.1.0 pyarrow_hotfix NA pytz 2024.1 requests 2.32.3 s3fs 2024.5.0 scipy 1.13.1 six 1.16.0 somacore 1.0.11 tiledb 0.29.0 tiledbsoma 1.11.3 typing_extensions NA urllib3 2.2.1 wrapt 1.16.0 yarl 1.9.4

Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] Linux-3.10.0-1160.108.1.el7.x86_64-x86_64-with-glibc2.17

Alex2975 commented 4 weeks ago

I also tried: census = cellxgene_census.open_soma(mirror='s3-eu-north-1') But I got this error: .../python3.11/site-packages/cellxgene_census/_open.py", line 224, in open_soma raise ValueError("Mirror not found.") ValueError: Mirror not found.

ivirshup commented 4 weeks ago

Ah yeah, there aren't actually any mirrors up yet.

For the continued failures, is it possible there's a firewall on your end?

Alex2975 commented 4 weeks ago

Yes, there is a firewall on the servers. Do you think that potentially cause the error? Could that be a time out or access error? If it is time out, how can I increase the waiting time?

ivirshup commented 4 weeks ago

That would definitely cause the error. It may just always block the connection, but it just looks like a the connection takes a while for you.

Could you try:

import s3fs

fs = s3fs.S3FileSystem()
fs.ls("s3://cellxgene-census-public-us-west-2")

If this also doesn't work, you would probably need to ask your IT team about this.

Could also confirm by trying this on a different network without the firewall?

Alex2975 commented 4 weeks ago

Thank you, @ivirshup . When I ran the fs.ls, as you described, without firewall, I got the error: PermissionError: Access Denied.

Alex2975 commented 4 weeks ago

When I ran aws s with no sign request, I did get certain results back (with or without firewall, I got the same answer):

aws s3 ls --no-sign-request s3://cellxgene-census-public-us-west-2/cell-census/

                       PRE 2023-05-15/
                       PRE 2023-07-25/
                       PRE 2023-10-30/
                       PRE 2023-12-04/
                       PRE 2023-12-06/
                       PRE 2023-12-15/
                       PRE 2024-04-29/
                       PRE 2024-05-06/
                       PRE 2024-05-13/
                       PRE 2024-05-20/
                       PRE 2024-05-27/

2023-12-13 10:28:59 190 mirrors.json 2024-05-28 07:11:43 3642 release.json

ivirshup commented 4 weeks ago

Hm. That's odd. And you're definitely not passing any other arguments here, and consistently get a timeout? I may ping a couple more people to see if there's something they recognize here.

And cellxgene_census.open_soma() still doesn't work without the firewall?

Could you also show the full traceback? It should have enough to see the line you called before getting this error.

pablo-gar commented 2 weeks ago

@Alex2975 seems like you are able to access Census now, is that correct?

see #1195

Alex2975 commented 2 weeks ago

@ivirshup and @pablo-gar , thank you so much for helping me. I still cannot access open_soma() from Europe cluster that I use. But I am currently access it from USA cluster. We are internally investigating the network proxy connections to see if anything is blocked from inside. Please close this issue if you would. I am all good for now calling API from USA side. Thank you.