MicroMedIAn / PathAIA

Digital Pathology Analysis Tools
GNU General Public License v3.0
3 stars 3 forks source link

Strange thumbnail obtained at patch extraction #20

Closed pilarOrtega closed 3 years ago

pilarOrtega commented 3 years ago

When we patchify a slide using Pathaia (with verbose >= 2) a thumbnail at each level of extraction is obtained with a grid showing the extracted patches (in the image, what it should look like)

When extracting patches from slides of a different cohort, at levels 2, 1 at 0, the thumbnails obtained look like this:

Level 1 and 2 have the same aspect as level 0 from the previous slide, while in level 0 we do not even have grid display. It may be due to the dilation of the grid to set the line width.

schwobr commented 3 years ago

In the first slide it seems that the grid cells just get too small compared to the actual resolution of the thumbnail and the line thickness. If thumbnail is taken at level 9 and patches extracted at level 0, for a slide of dimensions 90000x200000:

I think a good workaround would be to use slide.get_thumbnail instead of slide.read_region to generate the thumbnail. It would let us specify a target resolution that is large enough for each patch to have several pixels assigned. I would do something like:

def preview_from_queries(
    slide: openslide.OpenSlide,
    queries: Sequence[Patch],
    min_res: int = 512,
    color: Tuple[int, int, int] = (255, 255, 0),
    thickness: int = 3,
    cell_size: int = 3,
) -> NDByteImage:
    """
    Give thumbnail with patches displayed.

    Args:
        slide: openslide object
        queries: patch queries {"x", "y", "dx", "dy", "level"}
        min_res: minimum size for the smallest side of the thumbnail (usually the width)
        color: rgb color for patch boundaries
        thickness: thickness of patch boundaries
        cell_size: size of a cell representing a patch in the grid

    Returns:
        Thumbnail image with patches displayed.

    """
    # get thumbnail first
    w, h = slide.dimensions
    dx = queries[0]["dx"]
    dy = queries[0]["dy"]
    thumb_w = max(512, (w // dx)*(thickness + cell_size)+thickness)
    thumb_h = max(512, (h // dy)*(thickness + cell_size)+thickness)
    image = slide.get_thumbnail((thumb_w, thumb_h))
    thumb_w, thumb_h = image.size
    dsr_w = w / thumb_w
    dsr_h = h / thumb_h
    image = numpy.array(image)[:, :, 0:3]
    # get grid
    grid = 255 * numpy.ones((thumb_h, thumb_w), numpy.uint8)
    for query in queries:
        # position in queries are absolute
        x = int(query["x"] / dsr_w)
        y = int(query["y"] / dsr_h)
        dx = int(query["dx"] / dsr_w)
        dy = int(query["dy"] / dsr_h)
        startx = min(x, thumb_w - 1)
        starty = min(y, thumb_h - 1)
        endx = min(x + dx, thumb_w - 1)
        endy = min(y + dy, thumb_h - 1)
        # horizontal segments
        grid[starty, startx:endx] = 0
        grid[endy, startx:endx] = 0
        # vertical segments
        grid[starty:endy, startx] = 0
        grid[starty:endy, endx] = 0
    grid = grid < 255
    d = disk(thickness)
    grid = binary_dilation(grid, d)
    image[grid] = color
    return image

Note that:

However, all of the above comments are not that important as the core of the code stays the same. What changes is the evaluation of the needed resolution for the thumbnail to actually display a grid. At worst it is a bit approximative and we find ourselves in similar edge cases as before, they would just become less common. Depending on how precise we want to be this code can be more or less convoluted.

pilarOrtega commented 3 years ago

Thanks Robin! Indeed, the thumbnail was not big enough for all patches to have one pixel, let alone create a grid with a given thickness on top.

Even with a bigger thumbnail size, for lower levels the grid is not visible. That can be solved by increasing the default cell_size parameter so that it does not disappear when dilating the grid (if we make it 10 or 20 times the thickness the grid is visible at all levels, though the thumbnail is heavier). It would not change anything for thumbnails smaller than 512 px, but I think is nicer for smaller levels.

dx and dy should always be the same, but still I agree with you we may better include this as an argument, in case it is ever needed. And for the moment, I don't think it is worth it to include a grid for overlapping patches - depending on the overlap it risks of being a messy bunch of lines which I don't think is interesting either... I believe it might be more interesting to show a grid which only display the average borders between patches.

Still, the grid is just to have a little overview, so its not really essential that its perfect in all edge cases. It's just nice to have some little map to know where patches are :)