DigitalSlideArchive / HistomicsStream

A whole-slide image reader for TensorFlow
Apache License 2.0
22 stars 6 forks source link

Speed up performance of masking operations #115

Closed Leengit closed 1 year ago

Leengit commented 1 year ago

Closes #109.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Leengit commented 1 year ago

The primary focus of this pull request is to use scipy.interpolate.RegularGridInterpolator in figuring out which tiles overlap the mask sufficiently. There, and in a few other places, looping over tiles is now done inside numpy and scipy functions rather than as explicit Python for loops or explicit Python list comprehensions.

There are also changes to widths and heights that I discovered as I made the performance improvements. There were some places where widths were compared to or subtracted from heights, and vice versa, essentially apples-to-oranges mistakes in the masking code. I fixed that. In some sense, these were fairly drastic mistakes, but they were mostly hidden by the fact that tiles and chunks are usually square, not rectangular.

I also made some cosmetic changes:

  1. If both height and width are getting equivalent processing, we now do them in that order. This is because the code always uses [height][width] (or equivalently [row][column]) ordering of the indices rather than [x][y] ordering -- except where comments explicitly state otherwise -- and it helps to expose mistakes such as the height vs. width mistakes described above.
  2. I removed the use of row and column in favor of height and width in a few variable names because it was somewhat ambiguous to me whether, e.g., a row_stride is a horizontal offset within a row or a vertical offset between distinct rows. In contrast, height_stride more clearly tells me that it is a vertical offset between distinct rows.
cooperlab commented 1 year ago

Looks great @Leengit - nice work!

Leengit commented 1 year ago

Also, the performance speed up of using large_image_source_tiff needs to be supported in Jupyter labs and GitHub testing with pip install large_image[tiff] and there is a commit in this pull request to do that.

Leengit commented 1 year ago

I just did a force push to update the comment for the mask-performance commit to mention the use of scipy.