TIO-IKIM / CellViT

CellViT: Vision Transformers for Precise Cell Segmentation and Classification
https://doi.org/10.1016/j.media.2024.103143
Other
189 stars 27 forks source link

Comparison to PLUTO paper? #48

Closed cgebbe closed 1 month ago

cgebbe commented 1 month ago

Thank you for your great paper and code first of all!

The recent PLUTO paper (https://arxiv.org/pdf/2405.07905) also benchmarked its work. It uses the following decoders:

Questions:

  1. Since your paper was published 10/2023, have you also tried out by chance Mask2Former or the ViT adaption head?
  2. In my naive understanding, PLUTO + Hovernet is likely very similar to CellVit + Hovernet or am I missing important details?
FabianHoerst commented 1 month ago

Hi,

thank you for the great question. Regarding 1: We have not tested the Mask2Former approach. However, I am also curious on how it would perform, but do not have the capacity right now to test it. Regarding 2: I am not sure if you can state this totally, as the ViT encoder used by Pluto is a FlexiViT network, like explained here https://arxiv.org/pdf/2212.08013. I am also not sure about the multiscale input images. I would say, that there model is a more general one, encompassing a broader range of tasks that can be solved with it.