JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
390 stars 143 forks source link

Views for Dataset and Attribute; VirtualSource #946

Open mkitti opened 2 years ago

mkitti commented 2 years ago

I previously proposed DatasetView and AttributeView in #937. I removed it in https://github.com/JuliaIO/HDF5.jl/pull/937/commits/bc8fe33d14ceeab6c98b0d92b78552262daae018 .

The design was as follows:

struct DatasetView
    parent::Dataset
    indices
end

function Base.view(obj::Dataset, I...)
    return DatasetView(obj, I)
end

Base.similar(view::DatasetView) = similar(view.parent, length.(view.indices)...)

struct AttributeView
    parent::Attribute
    indices
end

function Base.view(obj::Attribute, I...)
    return AttributeView(obj, I)
end

Base.similar(view::AttributeView) = similar(view.parent, length.(view.indices)...)

const DatasetOrAttributeView = Union{DatasetView, AttributeView}

function Base.copyto!(output_buffer::AbstractArray{T}, view::DatasetOrAttributeView) where T
    return Base.read!(view.parent, output_buffer, view.indices...)
end

@simonbyrne suggested that we store the hyperslab in DatasetView.

If we move forward with #930, do we still need DatasetView since SubArray might be sufficient? Or do we still need a specialized DatasetView that is a combination of Dataset and Dataspace?

A DatasetView that is a combination of Dataset and Dataspace could also be reused as a VirtualSource replacing the implementation in HDF5Utils.jl, especially in combination with h5p_set_virtual.

This would be more capable than the h5py design for VirtualSource which does not allow for an arbitrary hyperslab selection for a Dataspace. For example, one may want a non-unit stride in one dimension as a virtual source.

simonbyrne commented 2 years ago

It's not possible to supply a Dataspace to an attribute in h5a_read, so it probably doesn't make sense to define views for attributes.