Open CodyCBakerPhD opened 1 year ago
Some work may also be needed with representation of NWB assets for Zarr back-end - no 'i' info button appears on the asset, and the API also fails to recognize the file as an asset, but rather every individual item blob is its own asset (this I had initially expected given the underlying structures of the Zarr store - but on Slack Roni had indicated that each Zarr chunk was not supposed to be a separate AssetBlob, which is what we are seeing below)
from dandi.dandiapi import DandiAPIClient
client = DandiAPIClient(api_url="https://api-staging.dandiarchive.org/api")
dandiset = client.get_dandiset(dandiset_id="204919")
dandiset.get_asset_by_path(path="test_read_nwbfile/test_hdf5.nwb")
works as expected, but
dandiset.get_asset_by_path(path="test_read_nwbfile/test_zarr.nwb")
gives
ValueError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/dandi/dandiapi.py:1155, in RemoteDandiset.get_asset_by_path(self, path)
1152 try:
1153 # Weed out any assets that happen to have the given path as a
1154 # proper prefix:
-> 1155 (asset,) = (
1156 a for a in self.get_assets_with_path_prefix(path) if a.path == path
1157 )
1158 except ValueError:
ValueError: not enough values to unpack (expected 1, got 0)
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
Cell In[21], line 1
----> 1 dandiset.get_asset_by_path(path="test_read_nwbfile/test_zarr.nwb")
File /opt/conda/lib/python3.10/site-packages/dandi/dandiapi.py:1159, in RemoteDandiset.get_asset_by_path(self, path)
1155 (asset,) = (
1156 a for a in self.get_assets_with_path_prefix(path) if a.path == path
1157 )
1158 except ValueError:
-> 1159 raise NotFoundError(f"No asset at path {path!r}")
1160 else:
1161 return asset
NotFoundError: No asset at path 'test_read_nwbfile/test_zarr.nwb'
and if I do
list(dandiset.get_assets())
I see
[RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='fd8e3782-b0c7-4bd5-89fe-e2acc0263744', path='test_read_nwbfile/test_hdf5.nwb', size=197512, created=datetime.datetime(2023, 7, 17, 15, 31, 55, 641893, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 778333, tzinfo=datetime.timezone.utc), blob='6a61bab5-0662-49e5-be46-0b9ee9a27297', dandiset_id='204919', version_id='0.230717.1558'),
RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='a78dfc02-9cd5-402a-83c8-5006fb18d5e8', path='test_read_nwbfile/test_zarr.nwb/acquisition/ElectricalSeries/data/0.0', size=46, created=datetime.datetime(2023, 7, 17, 15, 57, 45, 173503, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 787050, tzinfo=datetime.timezone.utc), blob='1419744b-36f6-4c28-a850-71d381fc90e5', dandiset_id='204919', version_id='0.230717.1558'),
RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='cd9faf76-cb4e-4849-b9eb-c838958676d1', path='test_read_nwbfile/test_zarr.nwb/acquisition/ElectricalSeries/electrodes/0', size=56, created=datetime.datetime(2023, 7, 17, 15, 57, 45, 215932, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 795464, tzinfo=datetime.timezone.utc), blob='e8131c7e-095d-4242-ab4c-1658c8c3f5c5', dandiset_id='204919', version_id='0.230717.1558'),
RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='383ece04-8db0-4207-843a-86109259a5cd', path='test_read_nwbfile/test_zarr.nwb/acquisition/ElectricalSeries/starting_time/0', size=24, created=datetime.datetime(2023, 7, 17, 15, 57, 45, 222857, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 909428, tzinfo=datetime.timezone.utc), blob='a1f46f4a-d8ec-4183-bd8c-8ed530e963e4', dandiset_id='204919', version_id='0.230717.1558'),
RemoteBlobAsset(client=<dandi.dandiapi.DandiAPIClient object at 0x7fc5162c4400>, identifier='871186e8-ac63-4c5e-b914-8b9246f7326a', path='test_read_nwbfile/test_zarr.nwb/file_create_date/0', size=56, created=datetime.datetime(2023, 7, 17, 15, 57, 45, 253174, tzinfo=datetime.timezone.utc), modified=datetime.datetime(2023, 7, 17, 15, 58, 44, 806273, tzinfo=datetime.timezone.utc), blob='9d7115fb-3133-437d-9168-7058e8fd84b6', dandiset_id='204919', version_id='0.230717.1558'),
....
and so on (the entire NWB file content listed out as separate blobs)
The context the asset ID part is that I want to be able to stream the content using fsspec
just like with HDF5 files
PyNWB can easily do this given the S3 asset of the HDF5, so I had thought that it would be just as easy if I had the asset ID of the Zarr folder (the 'test_zarr.nwb' file)
@CodyCBakerPhD - i'm pretty positive what's happening here is the non-recognition of zarr on the CLI side and hence it's simply using the non-zarr route, which then the server interprets as individual blobs. so a fix on the CLI side that treats it as zarr would fix it. can you simply try adding the .zarr
extension to test?
Well, that is interesting...
Making a copy of the file with the name test_zarr.nwb.zarr
(also confirmed same behavior with test_zarr.zarr
) allows for dandi upload
to appear as expected
however, nothing new appears on the dandiset view: https://gui-staging.dandiarchive.org/dandiset/204919/0.230717.1558/files?location=test_read_nwbfile%2F
or the API requests.
I also confirmed the asset made it to the bucket by attempting re-upload, to which it responds by saying the file already exists and so does not re-upload it
@CodyCBakerPhD - you have stumped me. perhaps @AlmightyYakob has an answer to why that asset doesn't show up.
The file is present, the link you provided points to a previously published version, and so won't show any files uploaded to the draft verison. You can see the file here: https://gui-staging.dandiarchive.org/dandiset/204919/draft/files?location=test_read_nwbfile
@AlmightyYakob Aha, yes that was it! Thank you for the sanity check
Would this workflow perhaps 'simply work' if I just naively add ".nwb" to the list of accepted Zarr entities? I'll try that out locally and see
Possibly related to #1307, but specific to NWB format files using the Zarr-backend
I'd like to be able to upload a
.nwb
file written using PyNWB+HDMF-Zarr to the DANDI archive, but thedandi upload
command was unable to recognize the file at all, and didn't even warn that it had been found and skipped for some reasonAn example file for testing purposes may be found here, which was forced through using devel options, specifically
--allow-any-path