Open mdoube opened 4 years ago
The cursors do avoid random access, which slow the op down (I've tested). Specifically the random access needed to read the 8-neighbourhood of a pixel. My bet is that the inner lambda function of LoopBuilder.forEachPixel(T)
(or Chunk
) will have to avoid it too to achieve performance similar to the current op.
If it works out, we might be able to improve the performance of other Modern plugins like VolumeFraction
Sure, worth having a go. ATM ElementFractionWrapper
just calls opService.stats().sum(interval)
to count foreground voxels, and Interval.size()
for total voxels. The former works, because in a BitType
image foreground is always 1
.
LoopBuilder won't work for this use case. It supports a maximum number of 6 cursors or random accesses. You could work around this be using Views.collapse(Views.stack(...)) but I wouldn't expect any performance improvements by this.
I would choose a different approach:
@maarzt I just did a little performance testing comparing the legacy and modern Connectivity plugins. You can see that even with some hacky multithreading, and avoiding random accesses with final Cursor<B> octantCursor1 = Views.flatIterable(interval).cursor();
the Modern approach is still way slower than the legacy approach. Is there a faster way to access pixels sequentially in an Op? especially pixel neighbourhoods, which is a standard kind of thing to do when processing with a kernel.
Performance timings
Timing Legacy Connectivity
Legacy connectivity took 1111 ms
Legacy connectivity took 876 ms
Legacy connectivity took 873 ms
Legacy connectivity took 1107 ms
Legacy connectivity took 909 ms
Timing Modern Connectivity
Modern connectivity took 14352 ms
Modern connectivity took 14298 ms
Modern connectivity took 14079 ms
Modern connectivity took 14071 ms
Modern connectivity took 14141 ms
Macro code:
print("Timing Legacy Connectivity");
for (i = 0; i < 5; i++){
start = getTime();
run("Connectivity");
end = getTime();
print("Legacy connectivity took "+end - start+" ms");
}
print("Timing Modern Connectivity");
for (i = 0; i < 5; i++){
start = getTime();
run("Connectivity (Modern)", "inputimage=net.imagej.ImgPlus@73672ce");
end = getTime();
print("Modern connectivity took "+end - start+" ms");
}
Test image umzc_378p_Apteryx_haastii_head.tif.zip
Describe the bug
LoopBuilder
is the ImgLib2 way to achieve chunked multithreading, see this for an example (specifically, the 'squared sum' example) https://forum.image.sc/t/imglib2-split-image-into-chunks-for-multi-threaded-processing/37519/4Modern Connectivity currently uses 8
cursors
to achieve multithreading, which is a bit of a hack.https://github.com/imagej/imagej-ops/commit/6a60315419fad3745a456f9cc2719e7897d7b762#diff-d6c590f31ede52f4518dba0b4a44cf9fR261
Additional context Any change to the code needs to be performance benchmarked and reported to @maarzt If it works out, we might be able to improve the performance of other Modern plugins like VolumeFraction