NeurodataWithoutBorders / nwb_benchmarks

Benchmarking for NWB-related operations.
https://nwb-benchmarks.readthedocs.io/en/latest/
Other
4 stars 1 forks source link

Add test cases for LINDI #47

Open oruebel opened 6 months ago

oruebel commented 6 months ago

Similar to #43 we should also add lindi to the test suite

rly commented 5 months ago

We currently have a benchmark to create a LINDI file from a remote NWB file. In my experience, for some large files with many chunks (such as the ophys file which has ~1 million chunks), that is slower than downloading the entire NWB file and creating a LINDI file with the downloaded local NWB file using references to the remote asset. It probably has to do with the many requests to the remote NWB file, one per chunk(?). So I think we should add such a benchmark where we download the file locally and create the LINDI file from that downloaded file.

https://github.com/NeurodataWithoutBorders/nwb_benchmarks/blob/66b8d92c15c104809474ae9bcdd05db8d235f0d0/src/nwb_benchmarks/benchmarks/network_tracking_remote_file_reading.py#L218-L223

rly commented 5 months ago

Separately, I think it would be good to have a benchmark that is simply download the entire file, as a point of comparison for some of these streaming methods. For some of the c elegans files with thousands of groups, it is actually faster to download and read than stream with fsspec/ros3, but it also depends on the read pattern.