How to use on other datasets?

clissa commented 1 year ago

Hello I came across the paper and I found it very interesting, thanks for sharing! I'd like to test your implementations on other datasets, for example Fluorescent Neuronal Cells dataset. In this case we only have one cell type, so I was thinking to replicate the pipeline used for MoNuSeg. However, I don't quite understand how to do pre-processing after I prepare the dataset.

So let's say I want to run inference on MoNuSeg as first step. After I download the data and prepare_monuseg.py, my understanding from the README is that I should run preprocessing as: python ./preprocessing/patch_extraction/main_extraction.py --config ./example/preprocessing_example.yaml. However I am getting this output:

2023-07-06 13:58:14,727 [INFO] - Data store directory: <path-to-project>/CellViT/example/output/preprocessing
2023-07-06 13:58:14,728 [INFO] - Images found: 0
2023-07-06 13:58:14,728 [INFO] - Annotations found: 0
2023-07-06 13:58:14,728 [INFO] - Removing complete dataset! This may take a while.
2023-07-06 13:58:14,731 [INFO] - Using 1 processes.
2023-07-06 13:58:14,759 [INFO] - Patches saved to: <path-to-project>/CellViT/example/output/preprocessing
2023-07-06 13:58:14,759 [INFO] - Total patches sampled for all WSI: 0
2023-07-06 13:58:14,759 [INFO] - Time usage: 0:00:00.004117
2023-07-06 13:58:14,760 [INFO] - Finished Preprocessing.

Can you give more details on how to re-run inference and possibly adapt to other datasets? Thanks in advance and nice work! ;)

FabianHoerst commented 1 year ago

Do you want to do inference or to retrain the model? The inference pipeline is designed specifically for WSI, not for arbitrary images. The prepare MoNuSeg pipeline script has been used to prepare the MoNuSeg data and is not compatible with the cell_detection.py script. To use your dataset, you should either stick to the following procedure:

If you have WSI files, please run:

Preprocessing pipeline with the command: python ./preprocessing/patch_extraction/main_extraction.py --config PATH/TO/YOUR/CONFIG.yaml. For this, you need to write an own .yaml file for your dataset. This procedure is described here: https://github.com/TIO-IKIM/CellViT#1-preprocessing and here: https://github.com/TIO-IKIM/CellViT/blob/main/docs/readmes/preprocessing.md
Run inference like described here: https://github.com/TIO-IKIM/CellViT#2-cell-detection-script

If you have patched images For this task, you have to write your own inference script and place it in the inference folder. You can use the inference_cellvit_monuseg dataset as an example, but i guess you need own dataloaders and should be careful about the calculated metrics.

FabianHoerst commented 1 year ago

If you want to retrain for binary datasets without classification, you should remove the classification head or change it to 2 classes. We cannot give general instructions on how to process new datasets as each has a different structure. We may assist you, but do not currently have ways of providing general scripts for each dataset.

clissa commented 1 year ago

I would start from just inference, and then possibly fine-tune on the new data. I don't have WSI but just png images, so I start from a format similar to the output of prepare_monuseg.py.

So from there, I would need to replicate this script for my dataset, right?

FabianHoerst commented 1 year ago

Exactly. One part should be to create a new dataset and replace it in [here](https://github.com/clissa/CellViT/blob/41f0651746c15ccea490401e3ea7ab78eeab2a23/cell_segmentation/inference/inference_cellvit_monuseg.py#L84C1-L84C1}, adapt the used metrics and maybe plotting. Just for curiosity, what is the input image size?

clissa commented 1 year ago

They are 1200x1600 pixels (283 images in total). However I am working to a new dataset release with many more images and variable image sizes up to rougly 1700x2200 pixels

FabianHoerst commented 1 year ago

Ok, then there is maybe some more work necessary for postprocessing, resizing and patching. My code is optimized to work with 1024x1024 px patches extracted from gigapixel wsi. Possible ideas could be padding with a black background or performing other padding operations and cutting out the results later from the masks.

clissa commented 1 year ago

I see, when you say "optimized" you mean by a code efficiency point of view or rather performance-wise?

I was thinking to 12 patches of 512x512 with some overlap...

FabianHoerst commented 1 year ago

I mean the inference code cell_detection.py and runtime efficiency, not segmentation performance. For overlap, you also need to write your own merging function to merge the feature maps/results.

clissa commented 1 year ago

I see, thank you very much for your support and thanks for sharing the code! :)

TIO-IKIM / CellViT

How to use on other datasets? #6