Open certik opened 4 years ago
There is also the one of @scivision : https://github.com/scivision/h5fortran
Anyway, since it depends on an external library I would not incluce in stdlib
(at least now).
I think we will all be in agreement to rather contribute this into @scivision's h5fortran
. So I am going to close this issue as out of scope for stdlib
. At least in the foreseeable future.
I have tried to make the h5fortran (HDF5) and nc4fortran (NetCDF4) user-facing APIs as identical as possible, so that a user program can easily swap between HDF5 and NetCDF file IO by a configure-time flag.
I used object-oriented interface h5fortran and nc4fortran because there are multiple internal variables to manipulate when doing non-trivial operations. The basic user-facing operations are like:
type(hdf_file) :: h
h%initialize('foo.h5', 'rw')
h%write('x', x)
h%read('y', y)
h%finalize()
write()
read()
and other methods are rank-agnostic (scalar..7D) and kind-agnostic {real32,real64,int32,int64,character} within the limits of HDF5 and NetCDF. Yes they can use opaque data for really arbitrary stuff, but that wasn't my need for HPC simulation and data assimilation.
With regard to binary file I/O in my opinion, raw binary I/O should be discouraged for most cases in any programming language.
There are a lot of other scientific formats like CDF, FITS and so on, but for out-of-core and cloud storage/processing and broadest data science library support, it is best in my opinion to focus efforts on HDF5. I only made a NetCDF4 interface because it's a subset of HDF5 and used by the large simulation packages I interface my models with.
I think HDF5 and the like could be handled with a stdlib shim that presents a user API like {loadtxt,savetxt}. So instead of the h5fortran/nc4fortran initialize, write, finalize
you would just have in stdlib
savefile('foo.h5', x)
loadfile('foo.h5', y)
The interface like savefile('foo.h5', x)
would make sense for stdlib
. So I reopened this issue. Thanks for the idea @scivision.
I made a new release v2.5.0 of h5fortran, which now works as simply as:
use h5fortran
call h5write('foo.h5', '/x', x)
call h5read('bar.h5', '/y', y)
that's polymorphic scalar..7d, int32,int64,real32,real64
When it comes to providing modern Fortran interfaces to libraries like HDF5, NetCDF, MPI, etc., if there is already a package out there that reasonably meets the "design principles" of stdlib, like I presume h5fortran does, I see no reason for stdlib to throw a layer over the top of it and assimilate the package into stdlib. Stdlib doesn't need to be the Borg of Fortran libraries. Let people use the package directly. I think there ought to be a compelling reason and value for stdlib to provide the interface, like perhaps it does for lapack.
fortran-utils has a very minimal HDF5 wrapper interface that is just a little bit higher level and easier to use: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/h5_utils.f90.
But I would honestly actually vote not to include this in
stdlib
, or maybe not initially. It feels to me that this would be better off in a separate package. What do you think?