biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.81k stars 1k forks source link

Orange Table-specific HDF5Reader #6791

Open stuart-cls opened 5 months ago

stuart-cls commented 5 months ago
Issue

Enable saving/loading the Orange Table data structure from the binary HDF5 container. Based on the implementation used in the dask branch, but with the dask parts removed.

Related: #6356

Description of changes
Includes
stuart-cls commented 5 months ago

I couldn't find a satisfactory solution to the .attributes problem (for the use case I care about) so I re-used the .metadata sidecar files for now, which is better than nothing.

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 93.67089% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 88.21%. Comparing base (5ada6c4) to head (63c71b3). Report is 21 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6791 +/- ## ======================================= Coverage 88.20% 88.21% ======================================= Files 327 327 Lines 71223 71301 +78 ======================================= + Hits 62825 62900 +75 - Misses 8398 8401 +3 ```
markotoplak commented 4 months ago

Comments from @stuart-cls (from his email, just so that they do not get lost):

stuart-cls commented 4 days ago

Regarding the previous comments:

Table.attributes -> I could try again to properly store this in the HDF5, instead of the .metadata sidecar. I got stuck trying to get the round-trip on a visible image to work :)

In general this format isn't doing anything clever with nested dictionaries (see domain_args for example). It would be a lot of work to map this to HDF5, and this is the same problem with Table.attributes.

Compatibility with dask branch: both opening files saved with that branch, and updating the branch to be compatible.

I've tested both ways, it works fine. The new reader checks for "Orange" in the "creator" attribute, but falls back to checking that the "domain" group is there.