NOAA-OWP / t-route

Tree based hydrologic and hydraulic routing
Other
40 stars 45 forks source link

t-route segfaults when loaded after `netcdf-c<=4.9.1` shared library #705

Open aaraney opened 7 months ago

aaraney commented 7 months ago

TL;DR

T-Route's netCDF4(python library) dependency can segfault when another netcdf-c<=4.9.1 shared library has already been loaded. You are most likely to run into this issue when running NextGen with routing enabled. This issue is present in netCDF4 versions 1.6.5(latest) and 1.6.4. Versions <=1.6.3 are not affected.

To solve this issue either:

Current behavior

Note: this has only been confirmed on linux.

If you compile and run NextGen with routing enabled with netcdf-c<=4.9.1 and netCDF4==1.6.4 or netCDF4==1.6.5 (python dep), when NextGen starts routing a segmentation fault will occur. This appears to have been resolved in the latest un-released version of netCDF4.

This occurs because of shared library function loading precedence. netCDF4 1.6.4 and 1.6.5 ship with a pre-compiled version of netcdf-c>4.9.1 as a shared library. netCDF4 calls a shared library function, nc_rc_set, from its included shared library. nc_rc_set calls another library function, NC_rcfile_insert that exists in both the netCDF4 included shared library and whatever netcdf-c shared library you have installed and loaded. When this call is made, it is possible that the NC_rcfile_insert function call, calls the function from your netcdf-c shared library and not the netCDF4 shared library. To make things worse, different versions of netcdf-c have different function signatures for NC_rcfile_insert. This means that nc_rc_set's NC_rcfile_insert call could (in my case it did) have the wrong number of parameters or the order is incorrect. This leads to a segmentation fault when NC_rcfile_insert tries to call the strdup function on one of the function input arguments (that are likely incorrect).

To solve this issue either:

edit:

small update. if you are a mac user and you use brew to manage your packages, this likely will not affect you. brew's netcdf formula ships with 4.9.2.

aaraney commented 7 months ago

With a few small tweaks you can reproduce this issue locally:

# to run this script, first:
# linux: `export LD_LIBRARY_PATH=/usr/local/lib/python3.9/site-packages/netCDF4.libs`
#   mac: `export DYLD_LIBRARY_PATH=/usr/local/lib/python3.9/site-packages/netCDF4.libs`
# to find your site-package directory run
# `python -c 'import site; print(site.getsitepackages())'

import ctypes
import certifi
import os

def strencode(pystr,encoding=None):
    # encode a string into bytes.  If already bytes, do nothing.
    # uses 'utf-8' for default encoding.
    if encoding is None:
        encoding = 'utf-8'
    return pystr.encode(encoding)

# you likely need to change this
# use `nc-config --libs` to get the path to your `libnetcdf` shared library
og_nc = ctypes.CDLL("/usr/lib/aarch64-linux-gnu/libnetcdf.so", mode=ctypes.RTLD_GLOBAL)

# this likely will also need to change
# run `ls <the-path-to-netCDF4.libs>` and look for `libnetcdf-<xxx>.so.<xx>`
nc = ctypes.CDLL("/usr/local/lib/python3.9/site-packages/netCDF4.libs/libnetcdf-15d50133.so.19", mode=ctypes.RTLD_GLOBAL)

# segmentation fault
nc.nc_rc_set("HTTP.SSL.CAINFO", strencode(certifi.where()))