choosehappy / HistoQC

HistoQC is an open-source quality control tool for digital pathology slides
BSD 3-Clause Clear License
263 stars 105 forks source link

Speed optimizations #244

Closed ajinkya-kulkarni closed 1 year ago

CielAl commented 1 year ago

I recommend you avoid premature low-level optimization unless it comes with benchmarks showing significant improvement in running time.

The thing is, the sklearn's regionprops essentially uses the ndimage.find_objects to locate the bboxes of connected regions, which uses the c implementation in nd_image.c

and you replace that with ndimage.label which utilizes the cython implementation here: https://github.com/scipy/scipy/blob/main/scipy/ndimage/src/_ni_label.pyx

I doubt there are noticeable differences in terms of execution time between these two functions. Besides, the fact that you need to manipulate the obtained label array to get the area indicates that the change may actually degrade the readability of the code: note that the regionprops have all potentially required computation wrapped into the attributes.

choosehappy commented 1 year ago

thanks @CielAl, i agree. refactoring should be for code readability or improved efficient, but if it comes with added complexity for speed optimizations, it should be paired with an estimate of those benefits compare to the baseline, especially if it isn't obvious from reading the code directly

@ajinkya-kulkarni is that possible here?

nanli-emory commented 1 year ago

Close the PR since no response for several weeks