Open robgpita opened 5 months ago
I'm not too familiar with the code here, but thanks to the clear report I could follow along. You mention:
Naively, the source code was modified (call to order_cells) hardcoding method="sort". This did not work.
Do you know why this didn't work? Did it use the "sort" method but also run out of memory there? From reading the issue I gather you have a d8 flow direction type.
https://deltares.github.io/pyflwdir/latest/_examples/flwdir.html
Which due to this line selects the "walk" method:
Which states that it uses a lot of memory, suggesting "sort" method should be a good alternative:
The fact RAM use is only at 1/3 of capacity at the segfault time may be misleading, because it will try to allocate this memory at once with the "walk" method:
@visr Thanks for the reply. For reference, below is the result of using self.order_cells(method="sort")
on line 272 of pyflwdir.py
.
Traceback (most recent call last):
File "/foss_fim/src/accumulate_headwaters.py", line 110, in <module>
accumulate_flow(**vars(args))
File "/foss_fim/src/accumulate_headwaters.py", line 73, in accumulate_flow
flowaccum = flw.accuflux(headwaters, nodata=nodata, direction='up')
File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 555, in accuflux
seq=self.idxs_seq,
File "/usr/local/lib/python3.10/dist-packages/pyflwdir/pyflwdir.py", line 272, in idxs_seq
self.order_cells(method="sort")
File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 211, in order_cells
rnk, n = core.rank(self.idxs_ds, mv=self._mv)
File "/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
error_rewrite(e, 'typing')
File "/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py", line 409, in error_rewrite
raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function setitem>) found for signature:
>>> setitem(array(int32, 1d, C), float64, Literal[int](-1))
There are 16 candidate implementations:
- Of which 14 did not match due to:
Overload of function 'setitem': File: <numerous>: Line N/A.
With argument(s): '(array(int32, 1d, C), float64, int64)':
No match.
- Of which 2 did not match due to:
Overload in function 'SetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 176.
With argument(s): '(array(int32, 1d, C), float64, int64)':
Rejected as the implementation raised a specific error:
NumbaTypeError: unsupported array index type float64 in [float64]
raised from /usr/local/lib/python3.10/dist-packages/numba/core/typing/arraydecl.py:72
During: typing of setitem at /usr/local/lib/python3.10/dist-packages/pyflwdir/core.py (37)
File "usr/local/lib/python3.10/dist-packages/pyflwdir/core.py", line 37:
def rank(idxs_ds, mv=_mv):
<source elided>
while len(idxs_lst) > 0:
ranks[idxs_lst.pop(-1)] = -1
^
Interesting. I hope I'm not leading you astray, but if I read that error correctly it tries to do a setitem
with a key/index of type float64 rather than int64, in this part:
So it hits a loop and is trying to mark its path, but is failing to do so since idxs_lst.pop(-1)
is presumably somehow a float64.
I cannot reproduce this issue with "rhine_d8.tif"
because that doesn't hit this path. But otherwise locally forcing self.order_cells(method="sort")
seems to work fine. I'm running this with:
numba 0.59.1
llvmlite 0.42.0
python 3.12.3
You could try removing the @njit
from def rank
to see if you can (slowly) reproduce / debug this issue locally. And it's also worth it trying out the latest versions of dependencies / Python. Because from the source code in rank
I don't see how a float64
can end up there.
Hmm I should've tried asking Copilot earlier, this verbatim answer makes sense:
The issue might be with the way Numba is interpreting the types in your code. It might be incorrectly inferring the type of the index. To fix this, you can explicitly cast the index to an integer before using it:
ranks[int(idxs_lst.pop(-1))] = -1
And
ranks[int(idxs_lst.pop(-1))] = rnk
This will ensure that the index is always an integer, which should resolve the error.
@visr Thanks for your help and suggestions. After upgrading to numba 0.59.1
, llvmlite 0.42.0
& numpy 1.26.4
I modified those two lines above, and the previous type
associated errors with numba's setitem
were resolved. However, the same error occurs as when self.order_cells(method="walk")
is used.
Fatal Python error: Segmentation fault
Extension modules:
.... (total: 97)
Segmentation fault (core dumped)
Unfortunately, when using method="sort"
, the traceback is not displaying anything helpful. It only references the call to the top level function call (our accumulate_flow
), and nothing further down the call stack - no mention of pyflwdir
utilities.
I will try to upgrade our Python version next, as well as try and get more detailed debugging information as to what lines of code are responsible for causing the segmentation fault in hopes of addressing the root of the problem.
Unfortunately, I personally don't have much time either to dig into this, but from my numba wrangling experience I can share maybe a couple of tips.
If numba cannot allocate sufficient memory, it will generally report an appropriate error:
import numpy as np
import numba as nb
@nb.njit
def allocate(n):
return np.empty(n, dtype=np.float64)
a = allocate(int(1e12)) # my machine has 32 GB RAM
This raises: MemoryError: Allocation failed (probably too large).
Segfaults, however, are very easy to trigger. Numba doesn't do any bounds checking. Perhaps all the segfaults that I've generated with numba were due to silly indexing mistakes. It may be that it doesn't show up (consistently) with smaller inputs, I guess as long as numba stays within the process memory, the OS doesn't kill it. In the example below, indexing with 11 is definitely out of bounds, but I simply get a garbage value. A much larger index guarantees I'm definitely trespassing and Python crashes:
@nb.njit
def allocate_and_index():
a = np.empty(10, dtype=np.float64)
return a[10000000]
Bounds checking can be enabled since some time: https://numba.readthedocs.io/en/stable/reference/pysemantics.html#bounds-checking
nb.njit(boundscheck=True)
def allocate_and_index():
a = np.empty(10, dtype=np.float64)
return a[10000000]
Results in the error: IndexError: index is out of bounds
.
It may be tedious to set this on the decorator, you can also set it via an environmental variable:
import os
os.environ["NUMBA_BOUNDSCHECK"] = "1"
(This needs to go on top in the script such that the jit decorator is aware before anything gets compiled. Of course you can also just set it in your command line prior to starting Python.)
Generally, one of the first thing I try when running into segfaults is disabling numba entirely through another environmental variable:
import os
os.environ["NUMBA_DISABLE_JIT"] = "1"
This may not be feasible if the segfault is only triggered with large inputs, as dynamic Python can be 300 times slower than numba, so it may take an inordinate amount of time to trigger the error. One (obvious) option here is splitting up the function, run the part up until the segfault with numba, get the intermediate products out, and try to run the subsequent part without numba.
Anyway, I'd start with the numba boundscheck. It doesn't mention a line number in my test (I'm on numba 0.59.1), but if it errors, it will at least provide a starting point. For what it's worth: all the numba segfaults that I can remember the past also triggered errors in Python and were relatively straightforward to iron out...
Re-reading the title and OP another time: is it possible that we're looking at an int32 overflow or something? I'm not sure how that would then result in a segfault, but the examples sizes are suggestive:
np.iinfo(np.int32).max > (63000 * 80000) == False
Interestingly, the other example is big, but does fit:
np.iinfo(np.int32).max > (21236 * 26593) == True
Searching the pyflwdir project, I get 93 results in 18 files for int32
so they seem to be used liberally.
If you're feeling lucky, you could try running a search and replace, changing all of them to int64. Make sure to also update the uint32's, since the 63000 by 80000 outsizes unsigned integers as well. I get no hits for np.iinfo
, so NoData/sentinel values are probably hard coded and shouldn't be influenced by a change of dtype.
Worth noting that an unjitted version may give this error, since Python seems to warn for overflow:
In [9]: a = np.int32(np.iinfo(np.int32).max)
In [10]: a
Out[10]: 2147483647
In [11]: a + 1
<ipython-input-11-ca42ed42e993>:1: RuntimeWarning: overflow encountered in scalar add
a + 1
Out[11]: -2147483648
Unfortunately, it only seems to check scalar types though:
In [15]: b = np.full(1, np.iinfo(np.int32).max)
In [16]: b
Out[16]: array([2147483647])
In [17]: b + 1
Out[17]: array([-2147483648])
Re-reading the title and OP another time: is it possible that we're looking at an int32 overflow or something? I'm not sure how that would then result in a segfault, but the examples sizes are suggestive:
I don't think this is related to an int32 overflow, based on the data size a dtype is assigned, see also: https://github.com/Deltares/pyflwdir/blob/main/pyflwdir/pyflwdir.py#L167
Anyway, good to check if this is indeed the case in the flw
object.
When enabling Numba bounds checking, and inserting print statements, I was able to pinpoint the offending function. upstream_count seems to be causing the IndexError: index is out of bounds
and associated segmentation fault. I'm currently isolating/investigating the upstream_count function, and trying to understand/resolve the index error.
For reference, when idxs_ds.size
= 564728948 , there is no IndexError, and when idxs_ds.size
= 5082624240, the Numba IndexError arises.
Further investigation of the dtype for both idxs_ds
arrays (size which passes, and size which fails) reveals the larger array contains dtype=uint64
, and throws the IndexError. This is leading me to believe that it is indeed a data type issue.
pyFlwDir version checks
Reproducible Example
Current behaviour
In trying to accumulate flow (accuflux) on larger rasters generated from 1m LiDAR Data, Segmentation Faults are occurring. The specifics: 1m LiDAR flow direction file & headwaters file (same size raster)
flow_direction_filename is 253M head_waters_filename is 66M
Rasters generated from 3m LiDAR data will not seg fault, and process successfully.
flow_direction_filename is 107M head_waters_filename is 7.4M
When the Python script is called from a shell script, an Exit Status of 139 is observed. Further debugging:
Naively, the source code was modified (call to order_cells) hardcoding
method="sort"
. This did not work. As seen above, line 215 in order_cells, calls core.idxs_seq which appears to be the root of the problem. No further investigation has been made past this point.Desired behaviour
Ideally larger rasters would process without segmentation faults. If not, the exception could potentially be handled a little more elegantly from python with a message stating that the raster is too big to process, or....
Another option might entail providing documentation/examples to users on how to split larger rasters into chunks, and then provide the tool/utility to join/concatenate 'blocked' or 'chunked' rasters back into a single
pyflwdir.FlwdirRaster
object once processing (whether it beaccuflux
or otherFlwdirRaster
methods) is finished.Additional context
Memory usage was tracked, and it was observed that less than 1/3 of the available RAM was in use when the segmentation fault occurred.