TIO-IKIM / CellViT

CellViT: Vision Transformers for Precise Cell Segmentation and Classification
https://doi.org/10.1016/j.media.2024.103143
Other
217 stars 33 forks source link

When performing inference on the MoNuSeg dataset, the following code will ignore some regions of a image. #35

Closed windygoo closed 9 months ago

windygoo commented 9 months ago

Given an image of size (3, 1000, 1000), the shape of returned tensor is (3, 4, 4, 256, 256). Actually, if using overlap of 64 pixels, the shape should be ((3, 5, 5, 256, 256)).

image

Illustration:

Input: a = torch.arange(1000) b = a.unfold(0, 256, 256 - 64) print(b)

Output: tensor([[ 0, 1, 2, ..., 253, 254, 255], [192, 193, 194, ..., 445, 446, 447], [384, 385, 386, ..., 637, 638, 639], [576, 577, 578, ..., 829, 830, 831]])

The rest region from 832 to 999 is ignored during inference.

FabianHoerst commented 9 months ago

You should rescale the input images and ground-truth masks to 1024x1024 pixels to make them divisible by the token size.

windygoo commented 9 months ago

You should rescale the input images and ground-truth masks to 1024x1024 pixels to make them divisible by the token size.

Ok, thanks!