NeurodataWithoutBorders / lindi

Linked Data Interface (LINDI) - cloud-friendly access to NWB data
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

support dicts in reference file system for json files #40

Closed magland closed 7 months ago

magland commented 8 months ago

With this, the LindiReferenceFileSystemStore supports dicts as proposed in https://github.com/fsspec/filesystem_spec/pull/1562

Also, the to_reference_file_system() methods automatically convert all of the .zattrs, .zgroup, and .zarray files into dicts.

Also, Neurosift UI now already supports this update.

@rly @bendichter @oruebel

codecov-commenter commented 8 months ago

Codecov Report

Attention: Patch coverage is 85.00000% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 82.55%. Comparing base (96a896b) to head (b7f2c4e).

Files Patch % Lines
...ndi/LindiH5pyFile/LindiReferenceFileSystemStore.py 88.23% 2 Missing :warning:
lindi/LindiH5pyFile/LindiH5pyFile.py 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #40 +/- ## ========================================== + Coverage 82.44% 82.55% +0.10% ========================================== Files 25 25 Lines 1692 1708 +16 ========================================== + Hits 1395 1410 +15 - Misses 297 298 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bendichter commented 8 months ago

Awesome!

rly commented 7 months ago

This looks good to me. It seems like the JSONs are converted to string for the store representation and then converted back to JSON at the end of to_reference_file_system. Just curious - what do you think about making the internal store representation a dict/JSON and doing the translation to a string only if requested? That's probably a significant refactoring and maybe we would want to wait for fsspec to update before we make such a change. These JSONs are all small so I think it would not be a big deal either way.

magland commented 7 months ago

This looks good to me. It seems like the JSONs are converted to string for the store representation and then converted back to JSON at the end of to_reference_file_system. Just curious - what do you think about making the internal store representation a dict/JSON and doing the translation to a string only if requested? That's probably a significant refactoring and maybe we would want to wait for fsspec to update before we make such a change. These JSONs are all small so I think it would not be a big deal either way.

I wanted to preserve the idea that all the files in the reference file system can actually be thought of as files with real byte array content. If we find there is a performance advantage of skipping the round-trip we can revisit.