Closed mdraw closed 5 years ago
Accidentally closed this via a commit message...
I can now finally reproduce this bug (which wasn't that easy because it happened only once every ~200,000 iterations) and found out what's causing it: The issue happens in the numba-jitted generalized ufunc code at https://github.com/ELEKTRONN/elektronn3/blob/076efe043db0badf092cd3a70e8a48c16dfe751a/elektronn3/data/coord_transforms.py#L24-L30
The reported garbage values appear if u
, v
and/or w
point to indices in src
that are out of the bounds of src
, so line 30 reads from unallocated memory.
I didn't really think that was possible because the process would have just segfaulted, but it turns out that segfaults only happen sometimes in this case, while in most cases the dest
array will just be silently filled with some garbage values. Segfaults seem to happen more often if the out-of-bounds memory access is further away from the actually allocated values.
It's hard to debug this because we can't set breakpoints, make shape checks or raise errors in jitted generalized ufuncs, so I'm not yet sure why exactly map_coordinates_nearest()
is sometimes called in a way that causes these problems, but I'm working on finding it out.
In some of the batches that are created by
PatchCreator
, the target tensor contains elements that are not inside the expected value range (which is given by the number of unique classes that exist in the data set).Quoting a comment from a previous commit message (https://github.com/ELEKTRONN/elektronn3/commit/46d0b2b0fc1f2beccdb02e1c788667dc218e73a9):
Such invalid targets are automatically detected by
PatchCreator
and their batches are discarded as a workaround for this problem, but that's certainly not a good way of dealing with it in the long term. We need to find out what's causing this bug. We may find the root of the problem somewhere around this code block: https://github.com/ELEKTRONN/elektronn3/blob/05dcd88340a66e703c79b6c6bc5e55e5939e899f/elektronn3/data/transformations.py#L355 or in the numba-jitted functions that are called from there.