NeurodataWithoutBorders / lindi

Linked Data Interface (LINDI) - cloud-friendly access to NWB data
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

LindiReferenceFileSystemStore with DANDI support #25

Closed magland closed 8 months ago

magland commented 8 months ago

Here's the docstring of the new LindiReferenceFileSystemStore class. This explains the purpose of this PR

LindiReferenceFileSystemStore is a Zarr store that reads data from a reference file system with special handling of some LINDI/DANDI-specific needs.

The reference file system is based on the ReferenceFileSystem of fsspec, but
this is a custom implementation that serves some LINDI-specific needs. In
particular, it handles reading data from DANDI URLs, even when the file is
part of an embargoed dataset. This requires some special handling as the
DANDI API URL must be exchanged for a pre-signed S3 bucket URL by
authenticating with a DANDI API token. This presigned URL expires have a
period of time, so this Zarr store handles the renewal of the presigned URL.
It also does the exchange once the first time and caches the redirected URL
for a period so that the redirect doesn't need to be done every time a
segment of a file is read.

To read from a file in an embargoed DANDI dataset, you will need to set the
DANDI_API_KEY environment variable to your DANDI API token. Or, if this is
and Dandiset in the staging site, you will need to set the
DANDI_STAGING_API_KEY.

Following the fsspec convention, the reference file system is specified as a
dictionary with a "refs" key. The value of "refs" is a dictionary where the
keys are the names of the files and the values are either strings or lists.
If the value is a string, it is assumed to be the data of the file, which
may be base64 encoded (see below). If the value is a list, it is assumed to
have three elements: the URL of the file (or path of a local file), the byte
offset of the data within the file, and the byte length of the data.

If the value for a file is a string, it may be prefixed with "base64:". If
it is, the string is assumed to be base64 encoded and is decoded before
being returned. Otherwise, the string is utf-8 encoded and returned as is.
Note that a file that actually begins with "base64:" should be represented
by a base64 encoded string, to avoid ambiguity.
rly commented 8 months ago

Note that this is set to merge to dev1 instead of main

rly commented 8 months ago

This presigned URL expires have a period of time, so this Zarr store handles the renewal of the presigned URL. It also does the exchange once the first time and caches the redirected URL for a period so that the redirect doesn't need to be done every time a segment of a file is read.

Great idea!