Vindaar / nimhdf5

Wrapper and some simple high-level bindings for the HDF5 library for the Nim language
MIT License
28 stars 2 forks source link

This repository contains thin high-level bindings for the [[https://www.hdfgroup.org/HDF5/][HDF5 data format]] for the Nim programming language. It also provides a wrapper of the full HDF5 C library, importable using:

+BEGIN_SRC nim

import nimhdf5/hdf5_wrapper

+END_SRC

The raw wrapper dynamically links the libhdf5.so (the main library) and libhdf5_hl.so (a library containing high-level convenience functions) libraries at runtime. All public functions of the two libraries are callable by their corresponding C names and C arguments. That means the Nim datatypes need to be manually cast to the corresponding compatible types, if only the wrapper is used.

The high-level bindings, while covering most general HDF5 features, are still in a rough state, due to limited testing by actual users. Most features should work fine (at least using linux), but some known bugs are still there. See [[file:examples/h5_high_level_example.nim]] as an overview (soon to be cleaned up, split and put into a tutorial form) on the available features and their usage. Also take a look at the tests for simple examples of specific features. For a more indepth example of usage of this library, take a look at:

The wrapper was built making heavy use of [[https://www.github.com/nim-lang/c2nim][c2nim]] and the main goal was to have a usable interface for the HDF5 data format. More advanced features (e.g. single writer / multiple reader) were of lower priority. Via the wrapper all features should in principle work, but many have not been tested.

** Compatibility

The wrapper is currently tested using Nim version =0.18.0= and the current devel branch (=0.18.1=).

The wrapper is built from HDF5 version =1.10.1=.

Linking against the HDF5 =1.8= library is reasonably supported as well, but requires to use an additional compiler flag for now:

+BEGIN_SRC sh

-d:H5_LEGACY

+END_SRC

With the recent release of version =1.10.3= / =1.10.4=, support for this version currently is available under the

+BEGIN_SRC

-d:H5_FUTURE

+END_SRC

flag. Soon this will become the default version! The =H5Oget/visit_...2= procedures are wrapped under a name without a =2= suffix. These however add a =fields= argument. Therefore an overload is available, which maps the =fields= to =H5O_INFO_ALL=.

*** Troubleshooting If you compile a nim program using =nimhdf5= without any of those flags and try to run it on a system with a HDF5 shared library of version =1.10.3= or newer, you will be greeted by:

+BEGIN_SRC sh

could not import: H5Oget_info

+END_SRC

In that case, add the =-d:H5_FUTURE= flag to the compilation command (or probably add it to your =nim.cfg= or =config.nims= of the project).

On the other hand if you try to run such a compiled binary on a system with a HDF5 library of version =1.8=, you will probably see:

+BEGIN_SRC sh

could not import: H5P_LST_FILE_CREATE_g

+END_SRC

add the =-d:H5_LEGACY= flag.

In case neither of these work, please open an issue!

*** Version checks in the wrapper Currently no checks are done, which compare the library this wrapper is built upon with the library linking against using some of the provided HDF5 macros (e.g. H5check, H5get_libversion etc.). The main reason is explained below. However, as far as the high-level functionality is concerned at the moment, the only differences arise in a few constant definitions, whose names slightly changed from =1.8= to =1.10=. This is what the compiler flags sets accordingly.

The HDF5 headers contain macros for many variables, such as

+BEGIN_SRC C

define H5F_ACC_RDONLY (H5CHECK H5OPEN 0x0000u)

+END_SRC

where

+BEGIN_SRC C

define H5CHECK H5check(),

+END_SRC

and

+BEGIN_SRC C

define H5OPEN H5open(),

+END_SRC

i.e. it makes use of C's comma operator. However, c2nim currently [[https://nim-lang.org/docs/c2nim.html#limitations][has no support for it]]. Instead of porting them in some reasonable way, these macros were converted to simple replacements with the values, dropping the calls to H5check() and H5open().

The call to H5check() is currently not used at all. Compiling a Nim program with this wrapper (based on version =1.10.1=) would normally fail to check against the linked library, if that version is different.

As H5open() is important, the calls are replaced by a single call to initialize the library at the beginning upon the first call of the library via [[file:src/nimhdf5/wrapper/H5niminitialize.nim]].

As HDF5 is a very macro heavy library, other important macros may not have been correctly wrapped to Nim, e.g. determination of correct sizes of data types. This may cause some weird side-effects (to be fair, I haven't noticed any!).

Additionally, Windows support is unknown at this time. The library name is correctly set for Windows, however an additional header file =H5FDwindows.h= might have to be wraped.

** Installation

Installation can either be done via nimble:

+BEGIN_SRC sh

nimble install nimhdf5

+END_SRC

or manually by cloning this git repository:

+BEGIN_SRC sh

git clone https://github.com/vindaar/nimhdf5

+END_SRC

in a folder of your choice and call nimble install afterwards:

+BEGIN_SRC sh

cd nimhdf5 nimble install

+END_SRC

Or simply make use of nimble's Github interfacing capabilities:

+BEGIN_SRC sh

nimble install https://github.com/vindaar/nimhdf5

+END_SRC

** Files

The folder [[file:c_headers/][c_headers]] contains the modified HDF5 headers in the state they were in for a successful c2nim conversion. In some cases the C header file had to be modified, in others modification to the resulting .nim file was still necessary.

The folder [[file:examples/][examples]] contains the basic HDF5 C examples (see here: [[https://support.hdfgroup.org/HDF5/examples/intro.html#c]]) converted to Nim utilizing the wrapper.

[[file:examples/h5_high_level_example.nim][h5_high_level_example.nim]] serves as a replacement for a tutorial for now (tutorial will be added soon!), showcasing (almost) all available features and their usage.

** Known bugs and quirks

The high level bindings come with several quirks which are good to know.

** Implemented HDF5 features

*** Single Writer Multiple Reader

This wrapper fully supports the Single Writer Multiple Reader feature of the HDF5 library, but it is still in an experimental state, as I've never really needed it.

It allows to access a single HDF5 file from multiple threads or processes, where one of these is a writer process and all others are readers. When using this feature the user does not have to worry about locks etc. between the different processes.

**** Usage Open an HDF5 file in write mode and hand the =swmr= flag:

+begin_src nim

writer.nim

import nimhdf5 var h5f = H5open("/tmp/test.h5", "rw", swmr = true)

do writing stuff

+end_src

and in all reader threads / processes, simply do the same, but do not hand a write:

+begin_src nim

reader.nim

import nimhdf5 var h5f = H5open("/tmp/test.h5", "r", swmr = true)

+end_src

This should be all that is required.

I'm not sure if the writer process should make sure to flush the file regularly or not. Feel free to tell me if you know. :)

***** Alternative writer

An alternative to the above for the writer process is to first open the file in write mode without ~swmr = true~ and then later put it into =swmr= mode via:

+begin_src nim

import nimhdf5 var h5f = H5open("/tmp/test.h5", "rw")

do some regular stuff

and then activate SWMR later

h5f.activateSWMR()

+end_src

*** Threadsafe HDF5 library & file locks

This wrapper can also be used with a HDF5 library that was compiled with the =--enable-threadsafe= compilation flag.

Once the library has been compiled with it, in principle the user can try to open a single file in write mode from multiple processes or threads. For safe handling in these contexts, it may be up to the user to lock access to the file / writing to individual datasets via some locking mechanism. In principle the threadsafe option of the library adds its own mutex logic, so in theory it should work without them.

See these notes about the threadsafe library: https://support.hdfgroup.org/HDF5/doc/TechNotes/ThreadSafeLibrary.html

One issue a user might encounter is that the second opening of a file yields an error saying that the resource is temporarily unavailable. In HDF5 version starting from =1.10=, file locking was added as a feature.

This behavior is controlled via an environment variable:

+begin_src sh

export HDF5_USE_FILE_LOCKING=FALSE

+end_src

If set to false, file locking is disabled. With it multiple processes may open the same file.

When doing this, keep in mind that each thread / process will receive their own =FileID=. Some HDF5 functions may either give the user information based on the specific file ID and others based on the actual file. In the cases where one can choose, it is supported via the =okLocal= (=ObjectKind= enum) or =fkLocal= (=FlushKind= enum).

Relevant part of the documentation: https://docs.hdfgroup.org/hdf5/develop/_h5public_8h.html#title31

** Blosc support

To use blosc as a filter you need to import:

+begin_src nim

import nimhdf5/blosc

+end_src

Before =v0.3.12= this was done automatically if the =nblosc= library is installed.