JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
383 stars 139 forks source link

Clean up Dataspaces #1104

Closed simonbyrne closed 8 months ago

simonbyrne commented 11 months ago
mkitti commented 11 months ago

There is a lot of reorganization here.

Should this go into 0.17 or could it wait for 0.18?

My sense is that 0.18 is months off rather than a year or two.

simonbyrne commented 11 months ago

If 0.17 is otherwise ready to go, then this can wait.

One question is what should HDF5.UNLIMITED actually be? For compatibility, I made it simply -1, but we could make it its own type?

mkitti commented 11 months ago

I'm not sure if this actually breaking or not, so perhaps it could be 0.17.1?

simonbyrne commented 11 months ago

The deprecations are breaking, so shouldn't go in a patch release.

This is part of my effort to clean up some interfaces, so happy to wait for 0.18.

mkitti commented 11 months ago

Regarding the documentation, I would like to convert as many of the examples to jldoctest as possible so that these are actually executed and tested with changes that we make.

mkitti commented 11 months ago

I would like to have parity with HDF5.Datatype. The issue with this is the similar with Core.DataType. Do you have anythoughts on how to resolve that?

simonbyrne commented 11 months ago

I would like to have parity with HDF5.Datatype. The issue with this is the similar with Core.DataType. Do you have anythoughts on how to resolve that?

The capitalization is different :smile:?

I guess we don't have to export it: by making max_dims a keyword argument to create_dataset, I don't think users will really need to manually use Dataspace objects all that much?

simonbyrne commented 11 months ago

Can you expand on what you mean by "I would like to have parity with HDF5.Datatype"?

mkitti commented 11 months ago

This package currently also exports datatype for obtaining instances of HDF5.Datatype which seems analogous to the relationship between dataspace and HDF5.Dataspace.

If we are going to increase use of HDF5.Dataspace constructors, should we also do the same HDF5.Datatype constructors?

simonbyrne commented 11 months ago

If we are going to increase use of HDF5.Dataspace constructors, should we also do the same HDF5.Datatype constructors?

Oh, I understand.

Yes, I think that would make sense: basically we would deprecate datatype(::Type) methods, and make them Datatype(::Type). We would keep datatype(obj) methods.

mkitti commented 11 months ago

I think Datatype might be confusing though since it could easily be confused with Julia's DataType.

Perhaps the better strategy would be to focus on not requiring a HDF5.Datatype but rather just a Julia type to most user facing interfaces. Meanwhile we make the HDF5.Datatype constructor do most of what datatype currently does.

How about this:

  1. Mainly use Julia data types instead of HDF5 data types in the primary interface.
  2. Make the HDF5.Datatype constructor do what datatype does now.
  3. Do not export HDF5.Datatype due to confusion. Make it only necessary to use in unusual circumstances.
simonbyrne commented 11 months ago

I'm happy with 1 & 3.

For 2., I think it would be consistent to keep datatype(obj) for "the Datatype of elements of obj", in the same way that this PR makes dataspace(obj) be the "Dataspace of the dimensions of obj" vs Dataspace(dims) being the "Dataspace with dimensions dims".

simonbyrne commented 11 months ago

something really odd is going on with the CI / julia 1 - windows-latest - x64.

We're getting an error in h5a_iterate: https://github.com/JuliaIO/HDF5.jl/actions/runs/6124333998/job/16624132842#step:5:200 From what I can tell, it looks like h5a_iterate isn't throwing an error correctly. I can make it go away by removing the lock or the try/catch for the lock, or be removing the check of the error stack (which may be a good idea in any case?)

mkitti commented 11 months ago

I saw the same error on Ubuntu. Do we have a race condition?

mkitti commented 11 months ago

See https://github.com/JuliaIO/HDF5.jl/actions/runs/6125740497/job/16628314924#step:5:1

simonbyrne commented 11 months ago

Possibly, but I can't figure out exactly how.

simonbyrne commented 11 months ago

it could be something in HDF5 itself that isn't setting the error stack correctly?

simonbyrne commented 11 months ago

@mkitti any more thoughts on this? I'd like to make some more changes which depend on this.

mkitti commented 11 months ago

I just got back home. Could you give me until Monday noon PT to look it over?

simonbyrne commented 11 months ago

No problem!

simonbyrne commented 11 months ago
1. What is the relationship between the UNLIMITED constant in HDF5 and API modules?

API.H5S_UNLIMITED = typemax(UInt64), whereas UNLIMITED = -1

2. Should we support a symbol for unlimited?

We could do, but the question is what should size return?

3. Should we preserve some of the legacy tests?

I believe most of the tests should still be there, I just rearranged them a bit.

mkitti commented 8 months ago

I'm thinking about just merging this and moving on to 0.18 development. It's been a while.