HDF5 interface - Githubissues

certik commented 4 years ago

fortran-utils has a very minimal HDF5 wrapper interface that is just a little bit higher level and easier to use: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/h5_utils.f90.

But I would honestly actually vote not to include this in stdlib, or maybe not initially. It feels to me that this would be better off in a separate package. What do you think?

jvdp1 commented 4 years ago

There is also the one of @scivision : https://github.com/scivision/h5fortran Anyway, since it depends on an external library I would not incluce in stdlib (at least now).

certik commented 4 years ago

I think we will all be in agreement to rather contribute this into @scivision's h5fortran. So I am going to close this issue as out of scope for stdlib. At least in the foreseeable future.

scivision commented 4 years ago

I have tried to make the h5fortran (HDF5) and nc4fortran (NetCDF4) user-facing APIs as identical as possible, so that a user program can easily swap between HDF5 and NetCDF file IO by a configure-time flag.

I used object-oriented interface h5fortran and nc4fortran because there are multiple internal variables to manipulate when doing non-trivial operations. The basic user-facing operations are like:

type(hdf_file) :: h

h%initialize('foo.h5', 'rw')
h%write('x', x)
h%read('y', y)
h%finalize()

write() read() and other methods are rank-agnostic (scalar..7D) and kind-agnostic {real32,real64,int32,int64,character} within the limits of HDF5 and NetCDF. Yes they can use opaque data for really arbitrary stuff, but that wasn't my need for HPC simulation and data assimilation.

scivision commented 4 years ago

With regard to binary file I/O in my opinion, raw binary I/O should be discouraged for most cases in any programming language.

There are a lot of other scientific formats like CDF, FITS and so on, but for out-of-core and cloud storage/processing and broadest data science library support, it is best in my opinion to focus efforts on HDF5. I only made a NetCDF4 interface because it's a subset of HDF5 and used by the large simulation packages I interface my models with.

scivision commented 4 years ago

I think HDF5 and the like could be handled with a stdlib shim that presents a user API like {loadtxt,savetxt}. So instead of the h5fortran/nc4fortran initialize, write, finalize you would just have in stdlib

savefile('foo.h5', x)
loadfile('foo.h5', y)

like other external libraries libpng etc., make it an option.
straightforward to implement in the near term.
can likewise add shims for FITS or other file formats contributors feel are important

certik commented 4 years ago

The interface like savefile('foo.h5', x) would make sense for stdlib. So I reopened this issue. Thanks for the idea @scivision.

scivision commented 4 years ago

I made a new release v2.5.0 of h5fortran, which now works as simply as:

use h5fortran

call h5write('foo.h5', '/x', x)

call h5read('bar.h5', '/y', y)

that's polymorphic scalar..7d, int32,int64,real32,real64

nncarlson commented 4 years ago

When it comes to providing modern Fortran interfaces to libraries like HDF5, NetCDF, MPI, etc., if there is already a package out there that reasonably meets the "design principles" of stdlib, like I presume h5fortran does, I see no reason for stdlib to throw a layer over the top of it and assimilate the package into stdlib. Stdlib doesn't need to be the Borg of Fortran libraries. Let people use the package directly. I think there ought to be a compelling reason and value for stdlib to provide the interface, like perhaps it does for lapack.

certik commented 4 years ago

@nncarlson It's not black and white where to draw the line what goes into stdlib and what does not, but I think we are all in agreement here, as indicated above, that h5fortran should stay as a separate package. (h5fortran is on my todo list to get working with fpm.)

fortran-lang / stdlib

HDF5 interface #101