emmazhou / napari-dvid

DVID loader for napari
MIT License
0 stars 1 forks source link

Reader plugin loading multiscale data #1

Open jni opened 3 years ago

jni commented 3 years ago

The widget is cool but even cooler would be:

napari https://emdata.janelia.org/api/repo/ee789

and the plugin would determine that it is a DVID API endpoint, and load the complete volume and segmentation data as multiscale.

This means:

https://github.com/emmazhou/napari-dvid/blob/f943b9f44d8912b5e2cb14cb50b91050d64c8a13/napari_dvid/_reader.py#L21-L22

https://github.com/emmazhou/napari-dvid/blob/f943b9f44d8912b5e2cb14cb50b91050d64c8a13/napari_dvid/_reader.py#L52-L57

You can see a model for making multiscale dask arrays in the ome-zarr repo, e.g. here:

https://github.com/ome/ome-zarr-py/blob/715f44d2519b6d71f1b79aabfef3fc8287bc87e2/ome_zarr/reader.py#L276-L287

Though there is a bit of extra complexity there... Essentially, instead of returning a single array dataset, you return a list of arrays of decreasing size.

DocSavage commented 3 years ago

Yes, using a URL like https://emdata.janelia.org/api/repo/{some uuid}/info will return JSON metadata for that repo if the UUID is valid (i.e., providing a matching prefix for an existing UUID). If the UUID prefix is not valid, it will return a status code 400 and an error message could not find UUID with partial match to...

A working example is the AL-7 Grayscale repository: https://emdata.janelia.org/api/repo/ab6e61/info

In the JSON metadata for that repo, there will be a "DataInstances" object that holds many data instance metadata objects: "grayscale" for the highest-resolution image volume, "grayscale_1" for the first downscale image volume, "grayscale_2"... "grayscale_6" for the lowest res. The high-res "grayscale" is 15168 x 14144 x 8896 voxels while the low-res "grayscale_6" is 256 x 256 x 192. You can get those dimensions by looking at the "Extended" object under each data instance object. In that "Extended" object are "MinPoint" and "MaxPoint" (3d) arrays. So DataInstances > grayscale > Extended > MaxPoint is the path to the 3d array specifying the max voxel where (0,0,0) is the origin.

Segmentation is frequently in another repository although we probably should have exported them in the same repo for public use. Internally, we tend to put the massive grayscale data out on the cloud while the segmentation gets served from fast NVMe drives within Janelia. For the purposes of seeing how napari scales, I'd just stick with grayscale for now to see how streaming and memory holds up.

DocSavage commented 3 years ago

The other reason to start with grayscale before doing segmentation is the amount of memory each will require. @jni Is there any segmentation compression scheme within napari? In our other systems, since the segmentation can be held as small 3d chunks/blocks, it can get 50x compression compared to uncompressed uint64/voxel, which will be 8x the memory impact of the grayscale.