Alexander-Barth / NCDatasets.jl

Load and create NetCDF files in Julia
MIT License
147 stars 31 forks source link

NetCDF_jll v400.702.402+0 and later broken on Windows #164

Closed visr closed 2 years ago

visr commented 2 years ago

Describe the bug

A new NetCDF_jll release was made a few days ago. It appears that this doesn't communicate well with the HDF5 dependency.

This was introduced in https://github.com/JuliaPackaging/Yggdrasil/pull/4481, cc @felixcremer. I don't see anything wrong in the build file, but the main difference is that it is linked to HDF5_jll v1.12.1+0 instead of the older v1.12.0+1.

Shall we yank the build? We would lose a functioning Apple M1 build however.

Side note: this is not directly related to NCDatasets. @Alexander-Barth do you prefer that I create these issues in Yggdrasil? I thought here might be better for visibility for users.

To Reproduce

Here is an example using only NetCDF_jll, to keep it as simple as possible.

julia> using NetCDF_jll
julia> unsafe_string(ccall((:nc_inq_libvers, libnetcdf), Cstring, ()))
"4.7.4 of Feb 22 2022 14:00:01 \$"
julia> NC_CLASSIC_MODEL = 0x0100
julia> NC_NETCDF4 = 0x1000
julia> # it can create a classic netcdf (no HDF5 needed)
julia> ccall((:nc_create, libnetcdf), Cint, (Cstring, Cint, Ptr{Cint}), "test.nc3", NC_CLASSIC_MODEL, Ref(Cint(0)))
julia> # but gives a segfault on netcdf4
julia> ccall((:nc_create, libnetcdf), Cint, (Cstring, Cint, Ptr{Cint}), "test.nc4", NC_NETCDF4, Ref(Cint(0)))

Expected behavior

Create an empty ""test.nc4" file.

Environment

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 3

Full output

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x2374cb3 -- .text at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
in expression starting at REPL[13]:1
.text at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
NC4_create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
NC_create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
nc__create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
nc_create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
top-level scope at .\REPL[13]:1
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:876
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:830
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:830
jl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:894 [inlined]
jl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:944
eval at .\boot.jl:373 [inlined]
eval_user_input at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:150
repl_backend_loop at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:246
start_repl_backend at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:231
#run_repl#47 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:364
run_repl at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:351
#930 at .\client.jl:394
jfptr_YY.930_36349.clone_1 at C:\Users\visser_mn\.julia\juliaup\julia-1.7.2+0~x64\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:757
#invokelatest#2 at .\essentials.jl:716 [inlined]
invokelatest at .\essentials.jl:714 [inlined]
run_main_repl at .\client.jl:379
exec_options at .\client.jl:309
_start at .\client.jl:495
jfptr__start_21275.clone_1 at C:\Users\visser_mn\.julia\juliaup\julia-1.7.2+0~x64\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:559
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:701
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:42
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 9681000 (Pool: 9675485; Big: 5515); GC: 13

Workaround

add NetCDF_jll@400.702.400

or

[compat]
NetCDF_jll = "=400.702.400"
Alexander-Barth commented 2 years ago

Thanks a lot for letting me know! Can somebody on Windows test if HDF5 with HDF5_jll v1.12.1+0 can run its test suite without failure?

visr commented 2 years ago

Yes, I just checked, HDF5.jl, doesn't have an issue with that HDF5_jll, so it seems both libs work individually, but not together, on Windows.

Alexander-Barth commented 2 years ago

Thanks @visr for the update! Filled also a bug report here: https://github.com/JuliaPackaging/Yggdrasil/issues/4511

Alexander-Barth commented 2 years ago

Shall we yank the build? We would lose a functioning Apple M1 build however.

As there are many more Windows users than Apple M1 users, I would indeed be in favour to yank the build of NetCDF_jll.

Alexander-Barth commented 2 years ago

This issue is not resolved but mitigated by declaring NCDatasets incompatible with v400.702.402+0 in NCDatasets 0.12.

visr commented 2 years ago

Thanks! Just curious, why did you go this route versus yanking the build from the registry? Though I suppose this way Apple M1 users still have a way to use the build if they want.

Alexander-Barth commented 2 years ago

Though I suppose this way Apple M1 users still have a way to use the build if they want.

Yes this is one reason, and I had a classroom full of students last Monday (mostly Windows, some Apple x86_64, some Linux) and needed a quick solution :-) But I consider this just a temporary fix. Having a installable but dysfunctional NetCDF_jll (for Windows) is still bad in my opinion.

Alexander-Barth commented 2 years ago

For Julia 1.8 all jll packages (with a shared dependency with julia) need to be rebuild. The is also the case of NetCDF_jll as NetCDF_jll and julia depend on libcurl. Unfortunately the work-around of the setting the compat entry does not work with julia 1.8 on Windows. Unfortunately, there is no installable NetCDF_jll currently on Windows with julia 1.8 (Linux and Mac OS seem to be fine on julia 1.8)

If a windows user want to contribute here is an overview of the involved steps:

  1. Install BinaryBuilder
  2. Clone https://github.com/JuliaPackaging/Yggdrasil
  3. adapt the file N/NetCDF/common.jl and N/NetCDF/NetCDF@julia-1.8/build_tarballs.jl and possibly also the corresponding files for HDF5 (dependency of NetCDF)
  4. get a GITHUB_TOKEN with write permission
  5. build the tarball with julia --color=yes ./build_tarballs.jl x86_64-w64-mingw32 --deploy="<your-github-username>/NetCDF_jll.jl" --verbose
  6. your can install your jll with Pkg.add(url="https://github.com/<your-github-username>/NetCDF_jll.jl")
  7. share your solution with a PR to Yggdrasil

Here is more information on BinaryBuilder: https://docs.binarybuilder.org/stable/. Windows support of BinarBuilder is currently under active development. Alternatively, one can also use a Linux VM for BinaryBuilder.

The hart nut to crack is this:

A workaround might be to install NetCDF via Conda.jl or build locally NetCDF_jll with HDF 1.12.0 (but I did not test these options and it is likely that one need to locally adapt NCDatasets compat entry of NetCDF_jll in the Project.toml file)

visr commented 2 years ago

For Julia 1.8 all jll packages (with a shared dependency with julia) need to be rebuild.

I've been on 1.8 beta for a while and never encountered any issues with the Windows build that is currently pinned (NetCDF_jll v400.702.400+0), and the tests pass locally.

Do you know of any code examples that would fail on 1.8 Windows? I assume these would have to use the shared dependencies (curl, MbedTLS, zlib).

Alexander-Barth commented 2 years ago

The world of dynamic libraries does not cease to surprise me ! I saw some issues on Linux with julia 1.8 that libnetcdf.so could not be loaded (similar to the transition from julia 1.5 to julia 1.6 where lubcurl was also updated) and rebuilding NetCDF_jll was the solution then. On Linux, I can only use NetCDF_jll v400.802.102+0 with julia 1.8 (not available for Windows).

But maybe Windows can handle a library version mismatch better than Linux. Does a opendap URL works for you?

using NCDataset 
ds = NCDataset("https://erddap.ifremer.fr/erddap/griddap/SDC_GLO_CLIM_TS_V2_1")
visr commented 2 years ago

Ha, fascinating indeed. The opendap URL also just works, including when I load some actual data into memory (ds["time"][:]).

Alexander-Barth commented 2 years ago

Thank you for confirming, I am hitting another bug with Linux #173.

Alexander-Barth commented 2 years ago

For future reference, here is how to pin the NetCDF version (for windows users only to run the NCDatasets master version on julia 1.8):

using Pkg
Pkg.add("NetCDF_jll")
Pkg.pin(name="NetCDF_jll", version="400.702.400")
Alexander-Barth commented 2 years ago

For the record, here is a test I made in June with HDF5 1.12.2 from MSYS2 (Windows):

$ pacman -Q | grep -i HDF5
mingw-w64-x86_64-hdf5 1.12.2-1

I compiled NetCDF C 4.8.1 from source in MSYS2

./configure --disable-testsets  --enable-shared  --disable-static  --disable-dap-remote-tests
make LDFLAGS=" -no-undefined -Wl,--export-all-symbols" 

In Julia, I used these libraries using set_preferences!:

using Preferences, HDF5_jll, NetCDF_jll

set_preferences!(HDF5_jll, "libhdf5_path" => raw"C:\msys64\mingw64\bin\libhdf5-0.dll")
set_preferences!(NetCDF_jll, "libnetcdf_path" => raw"C:\msys64\home\Alexander Barth\netcdf-c\liblib\.libs\libnetcdf-19.dll")

While running the test suite,

using NCDatasets
include(joinpath(dirname(pathof(NCDatasets)),"..","test","runtests.jl"))
NetCDF library: C:\msys64\home\Alexander Barth\netcdf-c\liblib\.libs\libnetcdf-19.dll

I had no failures:

NetCDF version: 4.8.1 of Jun  8 2022 21:44:34 $
Test Summary: | Pass  Total
NCDatasets    |  829    829
Test Summary:  | Pass  Total
NetCDF4 groups |    9      9
Test Summary:          | Pass  Total
Variable-length arrays |   22     22
Test Summary:  | Pass  Total
Compound types |   16     16
Test Summary:      | Pass  Total
Time and calendars |   25     25
Test Summary:       | Pass  Total
Multi-file datasets |   70     70
Test Summary:     | Pass  Total
Deferred datasets |   13     13
Test Summary: | Pass  Total
@select macro |   33     33
Test.DefaultTestSet("@select macro", Any[], 33, false, false)

Surprisingly, when NetCDF 4.9.0 is compiled with BinaryBuilder using the recently released HDF5_jll 1.12.2, I get now (again) these errors in julia 1.8.0 rc3:

NetCDF_jll.libnetcdf = "C:\\Users\\runneradmin\\.julia\\artifacts\\e3b96f6ac2bb213ecbcbce2ca0ac0bb43bf9561d\\bin\\libnetcdf-19.dll"
NetCDF library: C:\Users\runneradmin\.julia\artifacts\e3b96f6ac2bb213ecbcbce2ca0ac0bb43bf9561d\bin\libnetcdf-19.dll
NetCDF version: 4.9.0 of Aug  1 2022 12:57:29 $

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x66fb1a1c -- nc4_create_file at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:120
in expression starting at D:\a\NCDatasets.jl\NCDatasets.jl\test\test_simple.jl:11
nc4_create_file at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:120
NC4_create at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:321

Full logs https://github.com/Alexander-Barth/NCDatasets.jl/runs/7612204603?check_suite_focus=true

I somebody whats to try here is the dll of NetCDF 4.8.1 which worked for me: https://dox.ulg.ac.be/index.php/s/DSZy9SNCUJmCRZA

$ sha1sum libnetcdf-19.dll
47efeb7dcc8756d62d4bd832f2f3bbb8e7fd2c09 *libnetcdf-19.dll
visr commented 2 years ago

Oh that's a bummer! So do I understand correctly that it's not the HDF5 patch version 1 or 2 that is important, but whether or not netcdf is cross compiled? And the last cross compiled netcdf that was successfully built against a HDF5 mingw build, was netcdf 4.7 against HDF5 1.12.0? https://github.com/JuliaPackaging/Yggdrasil/blob/5b9aa3d48766ab2681f6b92e0b7e6116ddfc5e27/N/NetCDF/common.jl

Alexander-Barth commented 2 years ago

To be honest, I am not sure what exactly triggers this error but it appears indeed to be a cross-compilation issue introduced in hdf5 1.12.1. As far as I know netcdf only test native compliation, and only relatively recently the mingw compiler in CI.

Alexander-Barth commented 2 years ago

This long standing issue, should be fixed thanks to NetCDF_jll 400.902.5 .

visr commented 2 years ago

Thanks a lot for a major effort! Hope it gets easier. Looks like this issue can be unpinned as well :)

Alexander-Barth commented 2 years ago

Thanks Martijn for your help too!