dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 27 forks source link

test_nwb2asset_remote_asset timed out in test (ubuntu-latest, 3.8, dandi-api) #1450

Open yarikoptic opened 5 months ago

yarikoptic commented 5 months ago

https://github.com/dandi/dandi-cli/actions/runs/9395898413/job/25876096821

_________________________ test_nwb2asset_remote_asset __________________________
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/fscacher/cache.py:133: in fingerprinter
    path = op.realpath(path_orig)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/posixpath.py:390: in realpath
    filename = os.fspath(filename)
E   TypeError: expected str, bytes or os.PathLike object, not RemoteReadableAsset

During handling of the above exception, another exception occurred:
dandi/tests/test_metadata.py:889: in test_nwb2asset_remote_asset
    assert nwb2asset(r, digest=digest) == BareAsset.model_construct(
dandi/metadata/nwb.py:149: in nwb2asset
    metadata = get_metadata(nwb_path)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/fscacher/cache.py:138: in fingerprinter
    return f(*args, **kwargs)
dandi/metadata/nwb.py:106: in get_metadata
    meta.update(_get_pynwb_metadata(r))
dandi/pynwb_utils.py:206: in _get_pynwb_metadata
    with open_readable(path) as fp, h5py.File(fp, "r") as h5, NWBHDF5IO(
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/h5py/_hl/files.py:562: in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/h5py/_hl/files.py:235: in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
    ???
h5py/h5f.pyx:102: in h5py.h5f.open
    ???
h5py/h5fd.pyx:163: in h5py.h5fd.H5FD_fileobj_read
    ???
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/fsspec/spec.py:1915: in readinto
    data = self.read(out.nbytes)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/fsspec/implementations/http.py:598: in read
    return super().read(length)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/fsspec/spec.py:1897: in read
    out = self.cache._fetch(self.loc, self.loc + length)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/fsspec/caching.py:481: in _fetch
    self.cache = self.fetcher(start, bend)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/fsspec/asyn.py:118: in wrapper
    return sync(self.loop, func, *args, **kwargs)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/fsspec/asyn.py:91: in sync
    if event.wait(1):
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/threading.py:558: in wait
    signaled = self._cond.wait(timeout)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/threading.py:306: in wait
    gotit = waiter.acquire(True, timeout)
E   Failed: Timeout >300.0s
------------------------------ Captured log setup ------------------------------
DEBUG    dandi:dandiapi.py:197 POST http://localhost:8000/api/dandisets/
DEBUG    dandi:dandiapi.py:261 Response: 200
DEBUG    dandi:dandiapi.py:197 GET http://localhost:8000/api/info/
DEBUG    dandi:dandiapi.py:261 Response: 200
DEBUG    dandi:dandiapi.py:493 Using api key from DANDI_API_KEY environment variable
DEBUG    dandi:dandiapi.py:197 GET http://localhost:8000/api/auth/token
DEBUG    dandi:dandiapi.py:261 Response: 200
DEBUG    dandi:dandiset.py:128 Found identifier 000112 in top level 'identifier'
INFO     dandi:upload.py:214 Found 2 files to consider
DEBUG    dandi:dandiapi.py:197 GET http://localhost:8000/api/dandisets/000112/versions/draft/assets/
DEBUG    dandi:dandiapi.py:261 Response: 200
DEBUG    dandi:bases.py:342 Calculated dandi-etag of 2dfea6d6ebb6018f3a5c59707bc69ed3-1 for /tmp/pytest-of-runner/pytest-0/dandiset107/sub-mouse001/sub-mouse001.nwb
DEBUG    dandi:bases.py:351 sub-mouse001/sub-mouse001.nwb: Beginning upload
DEBUG    dandi:dandiapi.py:197 POST http://localhost:8000/api/uploads/initialize/
DEBUG    dandi:dandiapi.py:261 Response: 409
DEBUG    dandi:dandiapi.py:268 Error 409 while sending POST request to http://localhost:8000/api/uploads/initialize/: "Blob already exists."
DEBUG    dandi:bases.py:367 sub-mouse001/sub-mouse001.nwb: Blob already exists on server
DEBUG    dandi:bases.py:446 sub-mouse001/sub-mouse001.nwb: Assigning asset blob to dandiset & version
DEBUG    dandi:dandiapi.py:197 POST http://localhost:8000/api/dandisets/000112/versions/draft/assets/
DEBUG    dandi:dandiapi.py:261 Response: 200
INFO     dandi:bases.py:460 sub-mouse001/sub-mouse001.nwb: Asset successfully uploaded
------------------------------ Captured log call -------------------------------
DEBUG    dandi:dandiapi.py:197 GET http://localhost:8000/api/dandisets/000112/versions/draft/assets/
DEBUG    dandi:dandiapi.py:261 Response: 200
DEBUG    dandi:dandiapi.py:197 GET http://localhost:8000/api/dandisets/000112/versions/draft/assets/b770f47f-b67f-4a58-8883-09a2466eb577/
DEBUG    dandi:dandiapi.py:261 Response: 200
DEBUG    dandi:dandiapi.py:197 GET http://localhost:8000/api/dandisets/000112/versions/draft/assets/b770f47f-b67f-4a58-8883-09a2466eb577/
DEBUG    dandi:dandiapi.py:261 Response: 200
DEBUG    dandi:dandiapi.py:197 GET http://localhost:8000/api/dandisets/000112/versions/draft/assets/b770f47f-b67f-4a58-8883-09a2466eb577/
DEBUG    dandi:dandiapi.py:261 Response: 200

---------- coverage: platform linux, python 3.8.18-final-0 -----------

@jwodder please remind what those trailing requests are for -- waiting for validation or smth? I wonder if it is result of some recent change...

jwodder commented 5 months ago

@yarikoptic That's normal. The test performs three methods calls on the remote asset object that each ultimately end up having to request the asset's metadata from the server.

yarikoptic commented 5 months ago

ah, if that is not a sign of repeated rerequests -- can you tell from above what is timing out?

jwodder commented 5 months ago

@yarikoptic fsspec is timing out while trying to read the asset over S3.

yarikoptic commented 5 months ago

ok, let's shelf it for now but if we keep coming to it, might need smaller asset I guess.

yarikoptic commented 4 months ago

it hit us in daily tests again https://github.com/dandi/dandi-cli/actions/runs/10026093264/job/27710002745 and apparently it is quite common

dandi@drogon:/mnt/backup/dandi/tinuous-logs/dandi-cli/2024/07$ git grep 'dandi/tests/test_metadata.py::test_nwb2asset_remote_asset - Failed: Timeout' | grep '/test '
05/github/cron/20240705T060344/e2d963d/Tests/5704/test (ubuntu-latest, 3.8, dandi-api)/10_Run Dandi API tests only.txt:2024-07-05T06:19:22.3952144Z FAILED dandi/tests/test_metadata.py::test_nwb2asset_remote_asset - Failed: Timeout >300.0s
07/github/cron/20240707T060412/e2d963d/Tests/5706/test (ubuntu-latest, 3.10, normal)/8_Run all tests.txt:2024-07-07T06:22:14.4078182Z FAILED dandi/tests/test_metadata.py::test_nwb2asset_remote_asset - Failed: Timeout >300.0s
15/github/cron/20240715T060342/547d7b8/Tests/5716/test (ubuntu-latest, 3.10, normal)/8_Run all tests.txt:2024-07-15T06:22:43.5684157Z FAILED dandi/tests/test_metadata.py::test_nwb2asset_remote_asset - Failed: Timeout >300.0s
15/github/cron/20240715T060342/547d7b8/Tests/5716/test (ubuntu-latest, 3.8, dandi-api)/10_Run Dandi API tests only.txt:2024-07-15T06:23:20.7772926Z FAILED dandi/tests/test_metadata.py::test_nwb2asset_remote_asset - Failed: Timeout >300.0s
16/github/cron/20240716T060409/e975d9a/Tests/5719/test (ubuntu-latest, 3.9, normal)/8_Run all tests.txt:2024-07-16T06:27:03.5490446Z FAILED dandi/tests/test_metadata.py::test_nwb2asset_remote_asset - Failed: Timeout >300.0s
19/github/cron/20240719T060339/e975d9a/Tests/5722/test (ubuntu-latest, 3.9, normal)/8_Run all tests.txt:2024-07-19T06:23:03.5577525Z FAILED dandi/tests/test_metadata.py::test_nwb2asset_remote_asset - Failed: Timeout >300.0s
21/github/cron/20240721T060314/e975d9a/Tests/5724/test (ubuntu-latest, 3.12, normal)/8_Run all tests.txt:2024-07-21T06:25:01.5039466Z FAILED dandi/tests/test_metadata.py::test_nwb2asset_remote_asset - Failed: Timeout >300.0s
jwodder commented 4 months ago

@yarikoptic I don't believe there's anything that can be done about this. It's almost certainly a network and/or S3 problem (or conceivably some inefficiency in fsspec, but I'm not about to debug that).

yarikoptic commented 3 months ago

I would expect some layer to time out more promptly and so that we have a record that some attempts are repeated. If it just stalls -- it might be something more severe/worth fixing. Let's see how often it would come about (there was a fresh cases in dandi-schema)