FlwdirRaster.accuflux() segmentation faults with a raster ~ (63000, 80000) on machine with >1Tb RAM

pyFlwDir version checks

[X] I have checked that this issue has not already been reported.
[X] I have checked that this bug exists on the latest version of pyFlwDir.

Reproducible Example

import numpy as np
import rasterio as rio
import pyflwdir 

def accumulate_flow(
    flow_direction_filename,
    headwaters_filename
):

   with rio.open(flow_direction_filename) as src:
        data = src.read(1)
        nodata = src.nodata
        profile = src.profile

    temp = np.ndarray(shape=np.shape(data), dtype=np.uint8)

    temp[data == 1] = 1
    temp[data == 2] = 128
    temp[data == 3] = 64
    temp[data == 4] = 32
    temp[data == 5] = 16
    temp[data == 6] = 8
    temp[data == 7] = 4
    temp[data == 8] = 2
    temp[data == nodata] = 247

    flw = pyflwdir.from_array(temp, ftype='d8', check_ftype=False)

    del temp

    # Read the flow direction raster
    with rio.open(headwaters_filename) as src:
        headwaters = src.read(1)
        nodata = src.nodata

    flowaccum = flw.accuflux(headwaters, nodata=nodata, direction='up')

    ...

Current behaviour

In trying to accumulate flow (accuflux) on larger rasters generated from 1m LiDAR Data, Segmentation Faults are occurring. The specifics: 1m LiDAR flow direction file & headwaters file (same size raster)

>>> np.shape(data)
(63708, 79780)

flow_direction_filename is 253M head_waters_filename is 66M

Rasters generated from 3m LiDAR data will not seg fault, and process successfully.

>>> np.shape(data)
(21236, 26593)

flow_direction_filename is 107M head_waters_filename is 7.4M

When the Python script is called from a shell script, an Exit Status of 139 is observed. Further debugging:

export PYTHONFAULTHANDLER=1 
root@f7f7e3af5d8b:/# python3 $srcDir/accumulate_headwaters.py -fd flowdir_d8_burned_filled_0.tif -wg headwaters_0.tif

Fatal Python error: Segmentation fault

Current thread 0x00007ff9300d8000 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 215 in order_cells
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/pyflwdir.py", line 272 in idxs_seq
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 555 in accuflux
  File "/accumulate_headwaters.py", line 73 in accumulate_flow
  File "/accumulate_headwaters.py", line 110 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, yaml._yaml, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, scipy._lib._ccallback_c, _cffi_backend, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, rasterio._version, rasterio._err, rasterio._filepath, rasterio._env, _brotli, rasterio._transform, rasterio._base, rasterio.crs, rasterio._features, rasterio._warp, rasterio._io (total: 98)
Segmentation fault (core dumped)

Naively, the source code was modified (call to order_cells) hardcoding method="sort". This did not work. As seen above, line 215 in order_cells, calls core.idxs_seq which appears to be the root of the problem. No further investigation has been made past this point.

Desired behaviour

Ideally larger rasters would process without segmentation faults. If not, the exception could potentially be handled a little more elegantly from python with a message stating that the raster is too big to process, or....

Another option might entail providing documentation/examples to users on how to split larger rasters into chunks, and then provide the tool/utility to join/concatenate 'blocked' or 'chunked' rasters back into a single pyflwdir.FlwdirRaster object once processing (whether it be accuflux or other FlwdirRaster methods) is finished.

Additional context

Memory usage was tracked, and it was observed that less than 1/3 of the available RAM was in use when the segmentation fault occurred.

I'm not too familiar with the code here, but thanks to the clear report I could follow along. You mention:

Naively, the source code was modified (call to order_cells) hardcoding method="sort". This did not work.

Do you know why this didn't work? Did it use the "sort" method but also run out of memory there? From reading the issue I gather you have a d8 flow direction type.

https://deltares.github.io/pyflwdir/latest/_examples/flwdir.html

Which due to this line selects the "walk" method:

https://github.com/Deltares/pyflwdir/blob/45c5e63615a9919af9c7ea9719c3dde4f975d425/pyflwdir/pyflwdir.py#L272

Which states that it uses a lot of memory, suggesting "sort" method should be a good alternative:

https://github.com/Deltares/pyflwdir/blob/45c5e63615a9919af9c7ea9719c3dde4f975d425/pyflwdir/flwdir.py#L209-L215

The fact RAM use is only at 1/3 of capacity at the segfault time may be misleading, because it will try to allocate this memory at once with the "walk" method:

https://github.com/Deltares/pyflwdir/blob/45c5e63615a9919af9c7ea9719c3dde4f975d425/pyflwdir/core.py#L70

@visr Thanks for the reply. For reference, below is the result of using self.order_cells(method="sort") on line 272 of pyflwdir.py.

Traceback (most recent call last):
  File "/foss_fim/src/accumulate_headwaters.py", line 110, in <module>
    accumulate_flow(**vars(args))
  File "/foss_fim/src/accumulate_headwaters.py", line 73, in accumulate_flow
    flowaccum = flw.accuflux(headwaters, nodata=nodata, direction='up')
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 555, in accuflux
    seq=self.idxs_seq,
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/pyflwdir.py", line 272, in idxs_seq
    self.order_cells(method="sort")
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 211, in order_cells
    rnk, n = core.rank(self.idxs_ds, mv=self._mv)
  File "/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function setitem>) found for signature:

 >>> setitem(array(int32, 1d, C), float64, Literal[int](-1))

There are 16 candidate implementations:
    - Of which 14 did not match due to:
    Overload of function 'setitem': File: <numerous>: Line N/A.
      With argument(s): '(array(int32, 1d, C), float64, int64)':
     No match.
    - Of which 2 did not match due to:
    Overload in function 'SetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 176.
      With argument(s): '(array(int32, 1d, C), float64, int64)':
     Rejected as the implementation raised a specific error:
       NumbaTypeError: unsupported array index type float64 in [float64]
  raised from /usr/local/lib/python3.10/dist-packages/numba/core/typing/arraydecl.py:72

During: typing of setitem at /usr/local/lib/python3.10/dist-packages/pyflwdir/core.py (37)

File "usr/local/lib/python3.10/dist-packages/pyflwdir/core.py", line 37:
def rank(idxs_ds, mv=_mv):
    <source elided>
                while len(idxs_lst) > 0:
                    ranks[idxs_lst.pop(-1)] = -1
                    ^

Interesting. I hope I'm not leading you astray, but if I read that error correctly it tries to do a setitem with a key/index of type float64 rather than int64, in this part:

https://github.com/Deltares/pyflwdir/blob/45c5e63615a9919af9c7ea9719c3dde4f975d425/pyflwdir/core.py#L35-L38

So it hits a loop and is trying to mark its path, but is failing to do so since idxs_lst.pop(-1) is presumably somehow a float64. I cannot reproduce this issue with "rhine_d8.tif" because that doesn't hit this path. But otherwise locally forcing self.order_cells(method="sort") seems to work fine. I'm running this with:

numba 0.59.1
llvmlite 0.42.0
python 3.12.3

You could try removing the @njit from def rank to see if you can (slowly) reproduce / debug this issue locally. And it's also worth it trying out the latest versions of dependencies / Python. Because from the source code in rank I don't see how a float64 can end up there.

Hmm I should've tried asking Copilot earlier, this verbatim answer makes sense:

The issue might be with the way Numba is interpreting the types in your code. It might be incorrectly inferring the type of the index. To fix this, you can explicitly cast the index to an integer before using it:

ranks[int(idxs_lst.pop(-1))] = -1

And

ranks[int(idxs_lst.pop(-1))] = rnk

This will ensure that the index is always an integer, which should resolve the error.

@visr Thanks for your help and suggestions. After upgrading to numba 0.59.1, llvmlite 0.42.0 & numpy 1.26.4 I modified those two lines above, and the previous type associated errors with numba's setitem were resolved. However, the same error occurs as when self.order_cells(method="walk") is used.

Fatal Python error: Segmentation fault
Extension modules: 
....   (total: 97)
Segmentation fault (core dumped)

Unfortunately, when using method="sort", the traceback is not displaying anything helpful. It only references the call to the top level function call (our accumulate_flow), and nothing further down the call stack - no mention of pyflwdir utilities.

I will try to upgrade our Python version next, as well as try and get more detailed debugging information as to what lines of code are responsible for causing the segmentation fault in hopes of addressing the root of the problem.

Unfortunately, I personally don't have much time either to dig into this, but from my numba wrangling experience I can share maybe a couple of tips.

If numba cannot allocate sufficient memory, it will generally report an appropriate error:

import numpy as np
import numba as nb

@nb.njit
def allocate(n):
    return np.empty(n, dtype=np.float64)

a = allocate(int(1e12))  # my machine has 32 GB RAM

This raises: MemoryError: Allocation failed (probably too large).

Segfaults, however, are very easy to trigger. Numba doesn't do any bounds checking. Perhaps all the segfaults that I've generated with numba were due to silly indexing mistakes. It may be that it doesn't show up (consistently) with smaller inputs, I guess as long as numba stays within the process memory, the OS doesn't kill it. In the example below, indexing with 11 is definitely out of bounds, but I simply get a garbage value. A much larger index guarantees I'm definitely trespassing and Python crashes:

@nb.njit
def allocate_and_index():
    a = np.empty(10, dtype=np.float64)
    return a[10000000]

Bounds checking can be enabled since some time: https://numba.readthedocs.io/en/stable/reference/pysemantics.html#bounds-checking

nb.njit(boundscheck=True)
def allocate_and_index():
    a = np.empty(10, dtype=np.float64)
    return a[10000000]

Results in the error: IndexError: index is out of bounds.

It may be tedious to set this on the decorator, you can also set it via an environmental variable:

import os
os.environ["NUMBA_BOUNDSCHECK"] = "1"

(This needs to go on top in the script such that the jit decorator is aware before anything gets compiled. Of course you can also just set it in your command line prior to starting Python.)

Generally, one of the first thing I try when running into segfaults is disabling numba entirely through another environmental variable:

import os
os.environ["NUMBA_DISABLE_JIT"] = "1"

This may not be feasible if the segfault is only triggered with large inputs, as dynamic Python can be 300 times slower than numba, so it may take an inordinate amount of time to trigger the error. One (obvious) option here is splitting up the function, run the part up until the segfault with numba, get the intermediate products out, and try to run the subsequent part without numba.

Anyway, I'd start with the numba boundscheck. It doesn't mention a line number in my test (I'm on numba 0.59.1), but if it errors, it will at least provide a starting point. For what it's worth: all the numba segfaults that I can remember the past also triggered errors in Python and were relatively straightforward to iron out...

Re-reading the title and OP another time: is it possible that we're looking at an int32 overflow or something? I'm not sure how that would then result in a segfault, but the examples sizes are suggestive:

np.iinfo(np.int32).max > (63000 * 80000) == False

Interestingly, the other example is big, but does fit:

np.iinfo(np.int32).max > (21236 * 26593) == True

Searching the pyflwdir project, I get 93 results in 18 files for int32 so they seem to be used liberally.

If you're feeling lucky, you could try running a search and replace, changing all of them to int64. Make sure to also update the uint32's, since the 63000 by 80000 outsizes unsigned integers as well. I get no hits for np.iinfo, so NoData/sentinel values are probably hard coded and shouldn't be influenced by a change of dtype.

Worth noting that an unjitted version may give this error, since Python seems to warn for overflow:

In [9]: a = np.int32(np.iinfo(np.int32).max)

In [10]: a
Out[10]: 2147483647

In [11]: a + 1
<ipython-input-11-ca42ed42e993>:1: RuntimeWarning: overflow encountered in scalar add
  a + 1
Out[11]: -2147483648

Unfortunately, it only seems to check scalar types though:

In [15]: b = np.full(1, np.iinfo(np.int32).max)

In [16]: b
Out[16]: array([2147483647])

In [17]: b + 1
Out[17]: array([-2147483648])

Re-reading the title and OP another time: is it possible that we're looking at an int32 overflow or something? I'm not sure how that would then result in a segfault, but the examples sizes are suggestive:

I don't think this is related to an int32 overflow, based on the data size a dtype is assigned, see also: https://github.com/Deltares/pyflwdir/blob/main/pyflwdir/pyflwdir.py#L167

Anyway, good to check if this is indeed the case in the flw object.

When enabling Numba bounds checking, and inserting print statements, I was able to pinpoint the offending function. upstream_count seems to be causing the IndexError: index is out of bounds and associated segmentation fault. I'm currently isolating/investigating the upstream_count function, and trying to understand/resolve the index error.

For reference, when idxs_ds.size = 564728948 , there is no IndexError, and when idxs_ds.size = 5082624240, the Numba IndexError arises.

Further investigation of the dtype for both idxs_ds arrays (size which passes, and size which fails) reveals the larger array contains dtype=uint64, and throws the IndexError. This is leading me to believe that it is indeed a data type issue.

Deltares / pyflwdir