Strange thumbnail obtained at patch extraction

pilarOrtega commented 3 years ago

When we patchify a slide using Pathaia (with verbose >= 2) a thumbnail at each level of extraction is obtained with a grid showing the extracted patches (in the image, what it should look like)

When extracting patches from slides of a different cohort, at levels 2, 1 at 0, the thumbnails obtained look like this:

Level 1 and 2 have the same aspect as level 0 from the previous slide, while in level 0 we do not even have grid display. It may be due to the dilation of the grid to set the line width.

schwobr commented 3 years ago

In the first slide it seems that the grid cells just get too small compared to the actual resolution of the thumbnail and the line thickness. If thumbnail is taken at level 9 and patches extracted at level 0, for a slide of dimensions 90000x200000:

We can extract around 400x900 patches of size 224
Slide dimensions at level 0 are around 350x780 That means that the thumbnail does not even have enough pixels to allocate one pixel per patch. It is probably the problem you get in the second cohort.

I think a good workaround would be to use slide.get_thumbnail instead of slide.read_region to generate the thumbnail. It would let us specify a target resolution that is large enough for each patch to have several pixels assigned. I would do something like:

def preview_from_queries(
    slide: openslide.OpenSlide,
    queries: Sequence[Patch],
    min_res: int = 512,
    color: Tuple[int, int, int] = (255, 255, 0),
    thickness: int = 3,
    cell_size: int = 3,
) -> NDByteImage:
    """
    Give thumbnail with patches displayed.

    Args:
        slide: openslide object
        queries: patch queries {"x", "y", "dx", "dy", "level"}
        min_res: minimum size for the smallest side of the thumbnail (usually the width)
        color: rgb color for patch boundaries
        thickness: thickness of patch boundaries
        cell_size: size of a cell representing a patch in the grid

    Returns:
        Thumbnail image with patches displayed.

    """
    # get thumbnail first
    w, h = slide.dimensions
    dx = queries[0]["dx"]
    dy = queries[0]["dy"]
    thumb_w = max(512, (w // dx)*(thickness + cell_size)+thickness)
    thumb_h = max(512, (h // dy)*(thickness + cell_size)+thickness)
    image = slide.get_thumbnail((thumb_w, thumb_h))
    thumb_w, thumb_h = image.size
    dsr_w = w / thumb_w
    dsr_h = h / thumb_h
    image = numpy.array(image)[:, :, 0:3]
    # get grid
    grid = 255 * numpy.ones((thumb_h, thumb_w), numpy.uint8)
    for query in queries:
        # position in queries are absolute
        x = int(query["x"] / dsr_w)
        y = int(query["y"] / dsr_h)
        dx = int(query["dx"] / dsr_w)
        dy = int(query["dy"] / dsr_h)
        startx = min(x, thumb_w - 1)
        starty = min(y, thumb_h - 1)
        endx = min(x + dx, thumb_w - 1)
        endy = min(y + dy, thumb_h - 1)
        # horizontal segments
        grid[starty, startx:endx] = 0
        grid[endy, startx:endx] = 0
        # vertical segments
        grid[starty:endy, startx] = 0
        grid[starty:endy, endx] = 0
    grid = grid < 255
    d = disk(thickness)
    grid = binary_dilation(grid, d)
    image[grid] = color
    return image

Note that:

In theory dsr_w = dsr_h as get_thumbnail preserves the aspect ratio, but I'd rather use a safe option on this.
This supposes that dx and dy is the same for every query (which is normally the case as patch_size is always the same). I think it would be better to have patch_size as an argumenth of this function though.
This method would not work if patches overlap (as the number of patches is computed using dx and dy). If we want to take overlap into account, I suggest passingintervalas an argument to this and adapt the formulas forthumb_wandthumb_h`. But I think that generating grids that look fine for overlapping patches would be a pain in the a** anyway.
This supposes that the drawing is made of squares of side 2*thickness+cell_size (with each square overlaping the next one). If patches are not square this will not work properly. To make it work we could just have cell_size multiplied by max(dx, dy)/min(dx, dy) in the formula for the largest side of the thumbnail.

However, all of the above comments are not that important as the core of the code stays the same. What changes is the evaluation of the needed resolution for the thumbnail to actually display a grid. At worst it is a bit approximative and we find ourselves in similar edge cases as before, they would just become less common. Depending on how precise we want to be this code can be more or less convoluted.

pilarOrtega commented 3 years ago

Thanks Robin! Indeed, the thumbnail was not big enough for all patches to have one pixel, let alone create a grid with a given thickness on top.

Even with a bigger thumbnail size, for lower levels the grid is not visible. That can be solved by increasing the default cell_size parameter so that it does not disappear when dilating the grid (if we make it 10 or 20 times the thickness the grid is visible at all levels, though the thumbnail is heavier). It would not change anything for thumbnails smaller than 512 px, but I think is nicer for smaller levels.

dx and dy should always be the same, but still I agree with you we may better include this as an argument, in case it is ever needed. And for the moment, I don't think it is worth it to include a grid for overlapping patches - depending on the overlap it risks of being a messy bunch of lines which I don't think is interesting either... I believe it might be more interesting to show a grid which only display the average borders between patches.

Still, the grid is just to have a little overview, so its not really essential that its perfect in all edge cases. It's just nice to have some little map to know where patches are :)

MicroMedIAn / PathAIA

Strange thumbnail obtained at patch extraction #20