Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
752 stars 263 forks source link

How to include netcdf-c plugin capabilities in binary wheels #1164

Open jswhit2 opened 2 years ago

jswhit2 commented 2 years ago

netcdf-c 4.9.0 will have extra compression options based on plugins. The python interface now supports these via a compression kwarg to createVariable. In order to use the extra compression options (beyond zlib) the netcdf-c plugins will need to be installed in HDF5_PLUGIN_PATH. How do we provide this capability in the binary wheels on pypi? These wheels include bundled versions of the C libraries. Some options include: 1) figure out how to include the plugin shared objects in the wheels, and set HDF5_PLUGIN_PATH to point to the directory inside the installation 2) assume the user installs the plugins separately and sets HDF5_PLUGIN_PATH, and just raise an exception if the plugins are not found. 3) create a separate python package that installs the plugins (similar to what h5py does with hdf5plugin).

jswhit2 commented 2 years ago

Relevant netcdf-c issue: https://github.com/Unidata/netcdf-c/issues/2294

jswhit2 commented 2 years ago

I tried installing the hdf5plugin module, and setting HDF5_PLUGIN_PATH to point to the installation directory. This works for zstd, but not for bzip2 and blosc (nc_inq_var_XXX does not recognize them)

jswhit2 commented 2 years ago

setup.py has been modified to install the plugins (location specified by envar NETCDF_PLUGIN_DIR) in the package (using data_files). __init__.py. then sets HDF5_PLUGIN_PATH to netCDF4.__path__. With this, the new compression options should 'just work' with the binary wheels, without the need to point to an external directory.

https://github.com/Unidata/netcdf4-python/pull/1159

jswhit commented 2 years ago

auditwheel doesn't deal with the plugins correctly so the wheels for 1.6.0 do not include the plugins on linux

zklaus commented 2 years ago

It is notoriously difficult to deal with plugin systems including binary dependencies in wheels. This is because wheels have no way to ensure a consistent environment with regard to the surrounding shared libraries coming from the OS and possibly other sources. In the end, this leads to a lot of static linking. This problem is not unique to netcdf4; another big project facing this problem is GDAL, which supports many different formats via a plugin system. Their solution is to provide a pretty bare-bones wheel, and to offer more comprehensive installations in binary aware environments, such as conda environments or operating system package managers.

Could that be a model for netcdf4-python as well? I.e. have basic compression support in the wheel, perhaps only zlib, and include a more comprehensive set of compression options in the conda-forge package?

jswhit commented 2 years ago

Right now the wheels have support for extra compression filters, but the compression plugins themselves are not included. If there is a conda-forge netcdf plugin package (or a separate plugin wheel) the plugins should work as long as the plugin path env var is set.