fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.07k stars 164 forks source link

HDF5 interface #101

Open certik opened 4 years ago

certik commented 4 years ago

fortran-utils has a very minimal HDF5 wrapper interface that is just a little bit higher level and easier to use: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/h5_utils.f90.

But I would honestly actually vote not to include this in stdlib, or maybe not initially. It feels to me that this would be better off in a separate package. What do you think?

jvdp1 commented 4 years ago

There is also the one of @scivision : https://github.com/scivision/h5fortran Anyway, since it depends on an external library I would not incluce in stdlib (at least now).

certik commented 4 years ago

I think we will all be in agreement to rather contribute this into @scivision's h5fortran. So I am going to close this issue as out of scope for stdlib. At least in the foreseeable future.

scivision commented 4 years ago

I have tried to make the h5fortran (HDF5) and nc4fortran (NetCDF4) user-facing APIs as identical as possible, so that a user program can easily swap between HDF5 and NetCDF file IO by a configure-time flag.

I used object-oriented interface h5fortran and nc4fortran because there are multiple internal variables to manipulate when doing non-trivial operations. The basic user-facing operations are like:

type(hdf_file) :: h

h%initialize('foo.h5', 'rw')
h%write('x', x)
h%read('y', y)
h%finalize()

write() read() and other methods are rank-agnostic (scalar..7D) and kind-agnostic {real32,real64,int32,int64,character} within the limits of HDF5 and NetCDF. Yes they can use opaque data for really arbitrary stuff, but that wasn't my need for HPC simulation and data assimilation.

scivision commented 4 years ago

With regard to binary file I/O in my opinion, raw binary I/O should be discouraged for most cases in any programming language.

There are a lot of other scientific formats like CDF, FITS and so on, but for out-of-core and cloud storage/processing and broadest data science library support, it is best in my opinion to focus efforts on HDF5. I only made a NetCDF4 interface because it's a subset of HDF5 and used by the large simulation packages I interface my models with.

scivision commented 4 years ago

I think HDF5 and the like could be handled with a stdlib shim that presents a user API like {loadtxt,savetxt}. So instead of the h5fortran/nc4fortran initialize, write, finalize you would just have in stdlib

savefile('foo.h5', x)
loadfile('foo.h5', y)
certik commented 4 years ago

The interface like savefile('foo.h5', x) would make sense for stdlib. So I reopened this issue. Thanks for the idea @scivision.

scivision commented 4 years ago

I made a new release v2.5.0 of h5fortran, which now works as simply as:

use h5fortran

call h5write('foo.h5', '/x', x)

call h5read('bar.h5', '/y', y)

that's polymorphic scalar..7d, int32,int64,real32,real64

nncarlson commented 4 years ago

When it comes to providing modern Fortran interfaces to libraries like HDF5, NetCDF, MPI, etc., if there is already a package out there that reasonably meets the "design principles" of stdlib, like I presume h5fortran does, I see no reason for stdlib to throw a layer over the top of it and assimilate the package into stdlib. Stdlib doesn't need to be the Borg of Fortran libraries. Let people use the package directly. I think there ought to be a compelling reason and value for stdlib to provide the interface, like perhaps it does for lapack.

certik commented 4 years ago

@nncarlson It's not black and white where to draw the line what goes into stdlib and what does not, but I think we are all in agreement here, as indicated above, that h5fortran should stay as a separate package. (h5fortran is on my todo list to get working with fpm.)