multiple way of computing some label measurement

clEsperanto / pyclesperanto_prototype

GPU-accelerated bio-image analysis focusing on 3D+t microscopy image data

http://clesperanto.net

BSD 3-Clause "New" or "Revised" License

211 stars 46 forks source link

multiple way of computing some label measurement #340

Closed StRigaud closed 1 month ago

StRigaud commented 1 month ago

Is it normal that some label measurement (e.g. centroid of label) are computed in different ways depending on which function is used?

User can get this information from the statistics of label approach and from the centroids of labels which rely on different code for computing the coordinates.

Mostlikely that the results is the same (hopefully) but this increase code redondance. Wouldn't it be better to rely all on the statistics function computation? We may loose a bit of speed because we would also compute other measurement not requested but it would make more sense i guess.

Asking this for the pyclesperanto implementation development

haesleinhuepf commented 1 month ago

Wouldn't it be better to rely all on the statistics function computation?

Can you give an insight performance-wise? I could imagine that the one function is much slower than the other.

StRigaud commented 1 month ago

Idk if slower or not tbh, but we can easily imagied that statistics is heavier however it would make a lot of sense to rely on it for centroid computation later used for reduce_labels_to_centroids for example, instead of coming with a second way to compute centroids (which is what is done in the prototype).

very dirty speed test using the prototype on a label blob:

195 ms ± 36.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) // calling statistics
34.8 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) // calling the centroids of labels

We would clearly loose some speed (maybe a bit less in C++ but not certain). But we would consolidate the code and make less stuff for me to develop ...

Edit: Same test but using pyclesperanto with C++, the speed loss is much less

64.3 ms ± 831 μs per loop (mean ± std. dev. of 7 runs, 10 loops each) // calling statistics

I am on my M2 (not sure its relevant)

thawn commented 1 month ago

How about putting the algorithm that currently calculates the centroid from statistics of label into centroids of label and then calling centroids of label from inside statistics of label? that way, we would have identical results without speed problems?

haesleinhuepf commented 1 month ago

I think the algorithms are strongly related anyway. @StRigaud you should decide what's most important: Code length or speed/performance. Both are reasonable development goals, and you cannot achieve both commonly.

StRigaud commented 1 month ago

How about putting the algorithm that currently calculates the centroid from statistics of label into centroids of label and then calling centroids of label from inside statistics of label?

That's a way but:

it will not speed up the statistics operation as the centroids are computed at the same time as other values on the label
it requires to update the statistics function which is quite complex tbh
it does not reduce the amount of development I have to do 😭

you should decide what's most important

I am aiming on implementing the missing function for the assistant. So I will focus on making things work, hence reuse the statistics code as it is simple for me right now (no too much extra code and tests, no modification of the statistics function which is complicate, etc.). Nothing stops me later on (e.g. if an issue is raised) to optimise it back (most likely the way @thawn proposed it)