Closed Alexander-Barth closed 2 years ago
Smells of upstream bug to me (and given the track record of troubles with this library I'm not even surprised)
I retried to make a NetCDF 4.8.1 binary (with HDF5 1.12.1) and we got the same EXCEPTION_ACCESS_VIOLATION
on Windows . However if we downgrade HDF5 to 1.12.0 (without Apple M1 support) NetCDF 4.8.1 and 4.7.4 works on Windows. For Linux, all tested combinations seem to work. Is it possible to release different versions for different platforms? Can we yank only the Windows version of NetCDF_jll v400.702.402+0?
Can we yank only the Windows version of NetCDF_jll v400.702.402+0?
There is no concept of platform-specificity in the registry nor the package manager, so that's unrealistic
Should this release be yanked then from all platforms? Unfortunately, this will remove the Apple M1 binary, but Windows is pretty common ... (but the real solution we an updated working binary for all platforms).
That may be necessary, yes. But it'd also be great to understand why the Windows build is broken again. We haven't touched the mingw toolchain for quite some time, the hdf5 source is always the same (just a newer version maybe?), version of netcdf source is also the same as the last working version, no?
However, please make sure no dependents of netcdf_jll require the version you want to yank, otherwise you'll make those packages completely broken
But it'd also be great to understand why the Windows build is broken again.
It might be related to the HDF5_jll update. HDF5.jl runs fine but maybe an issue with the header files in HDF5_jll? The error does not occur in NetCDF when you do not use the HDF5 backend format.
version of netcdf source is also the same as the last working version, no?
yes, that is exactly the same version of NetCDF sources.
If we yank NetCDF_jll 400.702.402+0 we will fall-back to 400.702.400+0, I guess we will have a problem with the latest version of TempestRemap_jll:
https://github.com/JuliaPackaging/Yggdrasil/commit/59d281664e13dd8f69750e8b268c3396677df791
I did not see any other package incompatible with
grep -r NetCDF_jll .
./T/TempestModel_jll/Compat.toml:NetCDF_jll = "400.701.400-400.799"
./T/TempestModel_jll/Deps.toml:NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
./T/TempestRemap_jll/Compat.toml:NetCDF_jll = "400.701.400-400.799"
./T/TempestRemap_jll/Compat.toml:NetCDF_jll = "400.702.402-400.799"
./T/TempestRemap_jll/Deps.toml:NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
./I/IOAPI_jll/Compat.toml:NetCDF_jll = "400.701.400-400.799"
./I/IOAPI_jll/Deps.toml:NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
./N/NetcdfIO/Compat.toml:NetCDF_jll = "400.701.400-400.702.400"
./N/NetcdfIO/Deps.toml:NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
./N/NetCDF/Compat.toml:NetCDF_jll = "4.7.4-4"
./N/NetCDF/Compat.toml:NetCDF_jll = "400.701.400-400"
./N/NetCDF/Deps.toml:NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
./N/NCDatasets/Compat.toml:NetCDF_jll = "4.7.4-4"
./N/NCDatasets/Compat.toml:NetCDF_jll = "400.701.400-400"
./N/NCDatasets/Compat.toml:NetCDF_jll = ["400.701.400", "400.702.400"]
./N/NCDatasets/Deps.toml:NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
./N/NetCDFF_jll/Compat.toml:NetCDF_jll = "400.701.400-400.799"
./N/NetCDFF_jll/Deps.toml:NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
./N/NetCDF_jll/Package.toml:name = "NetCDF_jll"
./N/NetCDF_jll/Package.toml:repo = "https://github.com/JuliaBinaryWrappers/NetCDF_jll.jl.git"
./M/MDAL_jll/Compat.toml:NetCDF_jll = "400.701.400-400.799"
./M/MDAL_jll/Deps.toml:NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
./V/VMEC_jll/Compat.toml:NetCDF_jll = "400.702.400-400.799"
./V/VMEC_jll/Deps.toml:NetCDF_jll = "7243133f-43d8-5620-bbf4-c2c921802cf3"
./Registry.toml:7243133f-43d8-5620-bbf4-c2c921802cf3 = { name = "NetCDF_jll", path = "N/NetCDF_jll" }
It might be related to the HDF5_jll update.
The funny thing is that Windows is the only platform for which we always used the same source: the msys2 libraries.
Yes, indeed. I was wondering if there is reason to use msys2 over conda-forge on Windows for HDF5...
Msys2 is probably more consistent with the toolchain we use here (mingw) and we still need to provide the runtime dependencies. You'd need to investigate whether conda-forge build for Windows is compatible with our other libraries.
You are right, conda-forge uses the Windows Visual C/C++ compiler (https://conda-forge.org/docs/maintainer/knowledge_base.html)
Given that TempestRemap_jll depends on NetCDF_jll 400.702.402+0, does this means that we out of luck with the yanking this NetCDF_jll version?
Unfortunately, the diffs in the headers file quite massive (for a patch release!):
diff HDF5.v1.12.0.x86_64-w64-mingw32/include/ HDF5.v1.12.1.x86_64-w64-mingw32/include/ | wc -l
# 44843
As an alternative to yanking this version, could we also release a NetCDF 4.8.1 binary (which I got to build after applying several patches) with the "old" HDF5 1.12.0 ?
https://github.com/Alexander-Barth/NCDatasets.jl/issues/165#issuecomment-1057875784
With HDF5 1.12.2 from https://packages.msys2.org/package/mingw-w64-x86_64-hdf5, I don't see this error anymore.
I guess we would be able to upgrade HDF5 to 1.12.2 once this is merged:
That's great news! Only 10 pending reviewers I see, haha.
So I helped a bit to land https://github.com/conda-forge/hdf5-feedstock/pull/175, updating HDF5_jll to 1.12.2 in #5248. This however needed to be yanked from the registry again after finding out that #5249 conda-forge now builds against a libcurl version that is too new for us.
Does anyone have ideas how to get our hands on HDF5 1.12.2 builds for these platforms:
See also https://github.com/JuliaPackaging/Yggdrasil/pull/5249#issuecomment-1198008293, in case we can use conda infrastructure.
@visr thanks a lot for your work in updating HDF5 1.12.2. Too bad that we have now this libcurl issue, apparently only on MacOS). (I had a similar issue recently: https://github.com/JuliaPackaging/Yggdrasil/issues/5031, but I think it is unrelated).
I am not sure how to proceed. What about these possibilities:
Would option 1 or 2 have a chance to get accepted?
For me personally (1) sounds like a good quick fix that I had not thought of. Since the difference is only a patch release it might be acceptable, though I'm curious to see what @giordano thinks. Here are the patch release notes: https://www.hdfgroup.org/2022/04/release-of-hdf5-1-12-2-newsletter-183/.
(2) sounds like a good medium term solution to get more platforms supported. Will probably be some work to organize though. Using julia's buildkite for this might make it easier to get more platforms in one setup. I wonder how that effort compares to getting HDF5 to cross compile at this point (different skills though).
I'm not a huge fan of either solution (especially mixing and lying about version numbers), but hey, don't we lie all the time? (but at least we don't usually mix different versions....). My problem with 2 is who's going to maintain that? Certaintly not me, I can barely keep up with one project, another one using tools I'm completely unfamiliar with is out of reach for me at the moment (moment which will last fairly long). If it's someone else who ensures everything works fine and here we only need to click on merge, then it's ok.
apparently only on MacOS
If you want to understand what the error is about: https://github.com/giordano/macos-compatibility-version
Couldn't we edit the registry to change the compat of NetCDF.jl? We could add a compat entry for HDF5_jll v1.12.0.
Also note The HDF Group release schedule: https://github.com/HDFGroup/hdf5/blob/develop/doc/img/release-schedule.png
HDF5 v1.12.3 should be the last patch release of the v1.12 minor version series. After that they will move to v1.14 as the current stable release. Only the v1.10 release will continue to receive patches.
I submitted option 1 from @Alexander-Barth in #5251. Nobody loves this solution I think, but it should give us a build that we can work with for now, buying us time to get to better solutions.
For the record: I tried to compile a Linux binary for HDF5 within binary builder (first part of option 2). Unfortunately, it turned out more complicated than I thought (as usual). In fact, the build system uses x86_64-pc-linux-musl and the target is x86_64-linux-gnu, so the build system considers this as cross-compilation HDF5 and fails with:
checking maximum decimal precision for C... configure: error: in `/workspace/srcdir/hdf5-1.12.2':
configure: error: cannot run test program while cross compiling
(even when setting export PAC_C_MAX_REAL_PRECISION=33
, value from native compilation).
I have seen that conda-forge was able to cross-compile HDF5 for MacOS - aarch64. I guess that this patch would be allow us to by-pass this configure test:
https://github.com/conda-forge/hdf5-feedstock/blob/main/recipe/patches/osx_cross_configure.patch
Unfortunately, this patch is quite complicated and against an automatically generated file (configure
not configure.ac
).
Yes, cross compilation is challenging with HDF5. The main issue is that it requires one to obtain configuration from the target platform by executing test programs on that platform. Last time I looked into this, I think could be done via the CMakeCache. It probably should be changed so that we can figure out the calculation by just compiling a program since the compiler already knows many of these configuration details.
See https://forum.hdfgroup.org/t/cross-compiling-for-windows/6735/6 https://github.com/stevengj/hdf5
I tried to compile a sample HDF5 program from https://github.com/HDFGroup/hdf5-examples within BinaryBuilder , but I got a segmentation fault. Here are my steps:
From a BinaryBuilder session (with mingw64 target):
sandbox:${WORKSPACE}/srcdir/netcdf-c-4.9.0 # wget https://raw.githubusercontent.com/HDFGroup/hdf5-examples/master/C/H5T/h5ex_t_array.c
sandbox:${WORKSPACE}/srcdir/netcdf-c-4.9.0 # gcc -o h5ex_t_array.exe -I/workspace/destdir/include/ h5ex_t_array.c -L/workspace/destdir/bin/ -lhdf5-0
sandbox:${WORKSPACE}/srcdir/netcdf-c-4.9.0 # sha256sum /workspace/destdir/bin/zlib1.dll /workspace/destdir/bin/libsz.dll /workspace/destdir/bin/libhdf5-0.dll
a26b41bb482967b170453c93edf8f108052ab00f0c7d1134761f625c085f175e /workspace/destdir/bin/zlib1.dll
07e014276e614e91ff1ff55e6e3b465e1d03f736aa38b408f07c3159416060e8 /workspace/destdir/bin/libsz.dll
c2f5d5c789396d7b8f68eeff683433ade42e8feb1832aefe04d921ebe8b85470 /workspace/destdir/bin/libhdf5-0.dll
I transferred the binary h5ex_t_array.exe
and the 3 dlls (zlib1.dll, libsz.dll, libhdf5-0.dll) to a Windows system (msys shell):
$ ldd ./h5ex_t_array.exe
ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ff939f10000)
KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ff938ad0000)
KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ff9379d0000)
msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ff939900000)
libhdf5-0.dll => /home/Alexander Barth/libhdf5-0.dll (0x7ff90e180000) # <- from BinaryBuilder
ADVAPI32.dll => /c/WINDOWS/System32/ADVAPI32.dll (0x7ff939e20000)
sechost.dll => /c/WINDOWS/System32/sechost.dll (0x7ff938c40000)
RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7ff9380c0000)
zlib1.dll => /home/Alexander Barth/zlib1.dll (0x7ff9328a0000) # <- from BinaryBuilder
libwinpthread-1.dll => /mingw64/bin/libwinpthread-1.dll (0x7ff931ba0000)
libsz.dll => /home/Alexander Barth/libsz.dll (0x7ff931a90000) # <- from BinaryBuilder
$ ./h5ex_t_array.exe
Segmentation fault
$ sha256sum zlib1.dll libsz.dll libhdf5-0.dll
a26b41bb482967b170453c93edf8f108052ab00f0c7d1134761f625c085f175e *zlib1.dll
07e014276e614e91ff1ff55e6e3b465e1d03f736aa38b408f07c3159416060e8 *libsz.dll
c2f5d5c789396d7b8f68eeff683433ade42e8feb1832aefe04d921ebe8b85470 *libhdf5-0.dll
The example does work on my Linux system and on Windows when compiled natively with MSYS2. Should this test not also succeed on Windows with cross-compilation using BinaryBuilder?
If I also extract libwinpthread-1.dll
from BinaryBuilder, the program ./h5ex_t_array.exe does no more return an error (but there is no screen output, unlike native compilation). A output file h5ex_t_array.h5 is created, but it is too small and not readable:
$ h5dump h5ex_t_array.h5
h5dump error: unable to open file "h5ex_t_array.h5"
The example program seems to abort at this line: https://github.com/HDFGroup/hdf5-examples/blob/master/C/H5T/h5ex_t_array.c#L51
I'm getting confused. The HDF5 libraries are coming from msys2:
Shouldn't you be able to grab that package from msys2 and compile within msys2?
It is this package exactly: https://packages.msys2.org/package/mingw-w64-x86_64-hdf5
Yes, I am confused too, HDF5/NetCDF binaries has been a never-ending stream of moments of confusion :-)
The checksums of the HDF5 lib for native compilation in MSYS and BinaryBuilder are identical. I included all the steps, because maybe there is a problem how I tested it (I also run the exe from a cmd shell to avoid any interaction with my local MSYS installation). Maybe somebody has the time to produce it.
I mentioned a similar problem here: https://github.com/Alexander-Barth/NCDatasets.jl/issues/164#issuecomment-1202094798 Native compilation in MSYS of NetCDF with HDF5 worked but it fails with cross-compilation in BinaryBuilder.
Version of GCC is different (12.1 in MSYS, 4.8.5 in BinaryBuilder)...
The error is also reproducible with this smaller example https://github.com/HDFGroup/hdf5-examples/blob/master/C/H5T/h5ex_t_int.c
This example stops at H5Dcreate when running a cross-compiled binary: https://github.com/HDFGroup/hdf5-examples/blob/master/C/H5T/h5ex_t_int.c#L55
Shouldn't you be able to grab that package from msys2 and compile within msys2?
Yes, I installed this library using the package manager of MSYS (pacman).
We can specify a GCC version in BinaryBuilder.
What I would like to know is if you can compile and run the example completely within msys2. If so, then what is the difference between running it within msys2 and outside msys2?
We can specify a GCC version in BinaryBuilder.
I see gcc-6 and gcc-7 in the build image. Can we use a more recent version than gcc 7.5.0?
How can we do that?
What I would like to know is if you can compile and run the example completely within msys2.
Yes, this what I referred as native compilation before.
If so, then what is the difference between running it within msys2 and outside msys2?
The difference from what I have seen, is that when you run the binary within msys for any missing DLL (like libwinpthread-1.dll), the DLL at the standard location from MSYS will be used while when you run it outside of MSYS2 an error message is produced. Within MSYS you might get silently get an incompatible (or at least different) DLL.
I run only the cross-compiled version outside of MSYS.
As a test, I tried to cross-compile with the mingw cross compile from Ubuntu 20.04:
$ x86_64-w64-mingw32-gcc --version
x86_64-w64-mingw32-gcc (GCC) 9.3-win32 20200320
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I got the HDF5 dll from https://mirror.msys2.org/mingw/mingw64/mingw-w64-x86_64-hdf5-1.12.2-1-any.pkg.tar.zst, extracted in mingw64
$ x86_64-w64-mingw32-gcc -o h5ex_t_int -Imingw64/include/ h5ex_t_int.c -Lmingw64/lib -lhdf5
Surprisingly, the cross compiled binary using the Ubuntu tool chain worked on Windows!
So maybe it is a (cross-)compiler bug in gcc triggered by a change in HDF5 ?
It is possible to update the compiler version in BinaryBuilder? We use currently version gcc 4.8.5 from 2015.
Oh nice find! Are you looking for preferred_gcc_version
?
Perfect! Yes, indeed with preferred_gcc_version=v"5"
, I get a working binary for NetCDF 4.9.0 on Windows 🥳 !
NetCDF_jll.libnetcdf = "C:\\Users\\runneradmin\\.julia\\artifacts\\a6675d43b84627dcab6111bc4186b53bd3497992\\bin\\libnetcdf-19.dll"
NetCDF library: C:\Users\runneradmin\.julia\artifacts\a6675d43b84627dcab6111bc4186b53bd3497992\bin\libnetcdf-19.dll
NetCDF version: 4.9.0 of Aug 10 2022 13:19:08 $
Test Summary: | Pass Total Time
NCDatasets | 829 829 1m32.5s
Test Summary: | Pass Total Time
NetCDF4 groups | 9 9 1.1s
Test Summary: | Pass Total Time
Variable-length arrays | 22 22 1.5s
Test Summary: | Pass Total Time
Compound types | 16 16 2.0s
Test Summary: | Pass Total Time
Time and calendars | 25 25 1.1s
Test Summary: | Pass Total Time
Multi-file datasets | 70 70 28.9s
Test Summary: | Pass Total Time
Deferred datasets | 13 13 0.8s
Test Summary: | Pass Total Time
@select macro | 33 33 16.1s
x64_64 Linux and MacOS work too.
I will make a PR soon!
Oh that's awesome. So in the end it came down to a compiler bug. I guess we should've tried newer GCC sooner haha. It never occured to me. Thanks for the great effort!
I did not realize that it was so easy to change compiler version. I assumed somehow that we use the same version across all packages. It is great to know that we have to flexibility.
Yeah I believe for maximum compatibility with old systems and clusters the default GCC is so old. Though if that doesn't work, it can be bumped.
@Alexander-Barth right now there are three separate build scripts, for julia 1.3+, 1.6+ and 1.8+. HDF5_jll and many others have stopped building new versions for 1.3+, shall we drop it as well?
Do you know if having separate build scripts for 1.6+ and 1.8+ is required? This is the only difference:
and I don't see any other packages requiring libcurl 7.84.
Using Dependency("LibCURL_jll"; compat="7.73")
and julia_compat="1.6"
normally seems enough for builds to pass on everything from 1.6 to nightly. See for instance https://github.com/JuliaPackaging/Yggdrasil/pull/5301.
@visr You are right, we can consolidate all this (and drop julia 1.6). I just made a test on Linux/Windows/MacOS (x86_64) and julia 1.6/1.8 and all tests pass:
https://github.com/Alexander-Barth/NCDatasets.jl/actions/runs/2860834779
I made a PR here: https://github.com/JuliaPackaging/Yggdrasil/pull/5319 with a single build script for current julia versions.
Oh that's great news, that it works and that all tests pass! Should hopefully make it a bit easier to maintain as well.
I think that this can be closed with the release of NetCDF_jll v400.902.5+1.
Unfortunately, the new NetCDF_jll v400.702.402+0 does not work on Windows (as far as I know Linux x86_64 and apple M1 are fine).
The new version of NetCDF_jll was created in this commit: https://github.com/JuliaPackaging/Yggdrasil/pull/4481
The errors seem to be related to the upgrade of HDF5_jll v1.12.1+0.
Related bug reports: https://github.com/Alexander-Barth/NCDatasets.jl/issues/164 https://github.com/JuliaGeo/NetCDF.jl/issues/151
This is the full error message when a Windows user (on julia 1.7.2) creates a NetCDF (with HDF5 backend) as reported @visr is below.
Is there a way to test via CI natively the library before releasing them?