Unidata / netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library
http://unidata.github.io/netcdf4-python
MIT License
735 stars 259 forks source link

HDF Error when reading a NetCDF file as part of tests (only NetCDF4==1.7.1, using tox on Linux) #1343

Open matteobachetti opened 2 weeks ago

matteobachetti commented 2 weeks ago

My code saves and analyzes data in NetCDF4 format. I have no problem whatsoever with the analysis. However, when I run unit tests with tox on Linux I get a ton of HDF and OS errors, e.g.: https://github.com/StingraySoftware/HENDRICS/actions/runs/9580442835/job/26417155244?pr=164

I could reproduce this when running tox -e py311-test-alldeps, but only on Linux. On Mac OS (M1) the same tox command works with no issue, and if I run the tests with pytest on a fresh conda environment with the same software versions (in particular, the same necdf4, h5py, and numpy versions) of the tox environment, it works in all architectures. Apparently, I can only reproduce the issue while running with tox on Linux. This make debugging a lot more difficult.

On Stackoverflow, another user found that the error only occurred with NetCDF4 1.7.1, and indeed, fixing netcdf4 to !=1.7.1 made our test pass as well: https://github.com/StingraySoftware/HENDRICS/actions/runs/9650470922/job/26616163437?pr=165

larsevj commented 2 weeks ago

We are also seeing this issue with version 1.7.1, tests are failing on Linux: https://github.com/equinor/ert/actions/runs/9660713887/job/26646962384?pr=8189 Cloning the repo and building from source in the workflow fixes the issue, so seems like something is off with the pypi wheels?

ocefpaf commented 2 weeks ago

Can you can create a simple, small, and reproducible example of code that we can use to debug?

matteobachetti commented 2 weeks ago

@ocefpaf I'm trying but it's tricky. It only fails on tox for me! If I install the same version and run the tests directly with pytest, everything works

larsevj commented 2 weeks ago

I am not sure if this is the exact same issue as the tests failing above, but this code works on netCDF4<1.7.1, but fails on 1.7.1:

import h5py
import numpy as np
import netCDF4 as nc

rootgrp = nc.Dataset("test.nc", "w", format="NETCDF4")
x = rootgrp.createDimension("x", 1)
y = rootgrp.createVariable("y","f4",("x",))
y[:] = 0

, seems like the importing of h5py is messing things up with 1.7.1. Note, this example only fails if h5py is imported before netcdf4.

ocefpaf commented 2 weeks ago

Note, this example only fails if h5py is imported before netcdf4.

This is probably b/c of a mismatch on the hdf5 libraries used. Sadly, unless both h5py and netcdf4 coordinate on what to use, that is a limitation of wheels as far as I know and you'll have to separate the workflows that imports both. Sorry but I don't have a better solution for wheels. You could try other package managers, like conda, where both h5py and netcdf4 will be using the exact same hdf5 library to ensure things like this doesn't happen.

ocefpaf commented 2 weeks ago

Similar issues: https://github.com/Unidata/netcdf4-python/issues/653, https://github.com/Unidata/netcdf4-python/issues/1214, https://github.com/Unidata/netcdf4-python/issues/694, https://github.com/Unidata/netcdf4-python/issues/213.

@isuruf, sorry for the ping but do you believe that there is something I'm missing in the cibuildwheel configuration here that solves this? I recall that delvewheel fixed this on Windows but I thought that auditwheel was run by default on Linux and should also fix that, no?

NikosAlexandris commented 2 weeks ago

Same issue here too. Using 1.7.1 does give an Errno 101 when reading a netCDF file the usual way, ie Dataset('somefile.nc').

ZedThree commented 1 week ago

@ocefpaf Would it help to change ghcr.io/ocefpaf/manylinux2014_x86_64-netcdf to use FROM ghcr.io/h5py/manylinux2014_x86_64-hdf5 and explicitly build on top of the h5py manylinux image? I don't think that would guarantee compatibility, as that would require keeping releases in sync, but it should at least ensure there is one compatible version of h5py.

Another potential solution could be to add h5py as a build dependency, but I think this is likely much harder to make work.


I can't actually see how the built hdf5 libraries could really differ, both images start with a plain manylinux image, and built hdf5 1.14.2 with the default options. Yet in the wheel, we end up with libhdf5-7b49ac63.so.310.2.0 (netCDF4) and libhdf5-7f639dcd.so.310.2.0 (h5py).

ocefpaf commented 1 week ago

@ocefpaf Would it help to change ghcr.io/ocefpaf/manylinux2014_x86_64-netcdf to use FROM ghcr.io/h5py/manylinux2014_x86_64-hdf5 and explicitly build on top of the h5py manylinux image? I don't think that would guarantee compatibility, as that would require keeping releases in sync, but it should at least ensure there is one compatible version of h5py.

I tried that at first, even more collaboration to get things under the same image. But I confess I don't have the energy necessary to coordinate the non-Python dependencies of wheels, specially those that consume 3-4 C libraries, alone.

Maybe there is a wheel trick, advanced option, or something obvious that I'm missing but, unless the community works together, I don't see this how issue can be solved in the long run, just small burst of lucky for a release here and there.

ZedThree commented 1 week ago

Ah, that's a shame they weren't interested. This is a really hard community problem.

I've not actually been able to find any real differences between the built .sos, and in fact copying one version over the top of the other still gives me the same errors, so I don't think it's due to incompatible binaries.