Closed StRigaud closed 1 month ago
Wouldn't it be better to rely all on the statistics function computation?
Can you give an insight performance-wise? I could imagine that the one function is much slower than the other.
Idk if slower or not tbh, but we can easily imagied that statistics is heavier
however it would make a lot of sense to rely on it for centroid computation later used for reduce_labels_to_centroids
for example, instead of coming with a second way to compute centroids (which is what is done in the prototype).
very dirty speed test using the prototype on a label blob:
195 ms ± 36.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) // calling statistics
34.8 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) // calling the centroids of labels
We would clearly loose some speed (maybe a bit less in C++ but not certain). But we would consolidate the code and make less stuff for me to develop ...
Edit: Same test but using pyclesperanto
with C++, the speed loss is much less
64.3 ms ± 831 μs per loop (mean ± std. dev. of 7 runs, 10 loops each) // calling statistics
I am on my M2 (not sure its relevant)
How about putting the algorithm that currently calculates the centroid from statistics of label
into centroids of label
and then calling centroids of label
from inside statistics of label
? that way, we would have identical results without speed problems?
I think the algorithms are strongly related anyway. @StRigaud you should decide what's most important: Code length or speed/performance. Both are reasonable development goals, and you cannot achieve both commonly.
How about putting the algorithm that currently calculates the centroid from statistics of label into centroids of label and then calling centroids of label from inside statistics of label?
That's a way but:
you should decide what's most important
I am aiming on implementing the missing function for the assistant. So I will focus on making things work, hence reuse the statistics code as it is simple for me right now (no too much extra code and tests, no modification of the statistics function which is complicate, etc.). Nothing stops me later on (e.g. if an issue is raised) to optimise it back (most likely the way @thawn proposed it)
Is it normal that some label measurement (e.g. centroid of label) are computed in different ways depending on which function is used?
User can get this information from the
statistics of label
approach and from thecentroids of labels
which rely on different code for computing the coordinates.Mostlikely that the results is the same (hopefully) but this increase code redondance. Wouldn't it be better to rely all on the statistics function computation? We may loose a bit of speed because we would also compute other measurement not requested but it would make more sense i guess.
Asking this for the
pyclesperanto
implementation development