bioio-devs / bioio-base

Typing, base classes, and more for BioIO projects.
https://bioio-devs.github.io/bioio-base/
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

fast path to dims or shape #21

Open toloudis opened 1 week ago

toloudis commented 1 week ago

Feature Description

Getting dims or shape is too slow in many cases with large files.

Use Case

This has come up a lot in the past. Quite often just asking a large file about some basic info can take forever, when we know the relevant info is actually just stashed in a basic text format that's easy to grab. I think the reason for this being slow is that we have been bitten before where we can not trust the metadata if it's a large image acquisition that was truncated or aborted for whatever reason, and then the array data we load does not match the dims in the metadata. Maybe we could flag/warn about that at data loading time, or something. Currently the way we get dims is essentially to create the entire image as a delayed dask array and then get its shape.

Solution

Create a fast way to get dims by parsing metadata if available. I am not proposing api here but it could be dims_quick or just a parameter or arg to a dims() function.

This then turns into a per-reader override I suppose.

Alternatives

toloudis commented 1 week ago

Side note/question: is dims equivalent to shape combined with physical_pixel_size ? dims and shape are slightly different but probably could both have a fast path