Lee-Gihun / MEDIAR

(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"
MIT License
137 stars 30 forks source link

Public dataset preprocessing and public data selection strategy for pretraining #4

Closed fazlicodes closed 1 year ago

fazlicodes commented 1 year ago

Hi, there are no codes given to process the public datasets, and what are the discarded data from each of the public dataset.

Lee-Gihun commented 1 year ago

We specified the process on the public datasets in our paper (page9 - Public Data Usage).

For clarification, let me elaborate some more details:

We did not use specific processing code to process public data.

fazlicodes commented 1 year ago

Noted, thank you for your quick response!

fazlicodes commented 1 year ago

@Lee-Gihun how did you locate and exclude the non-microscopy images in the cellpose dataset?

Lee-Gihun commented 1 year ago

We manually removed few obvious images from the set (about 10~20 images).

To best my understanding, the original Cellpose paper, they regard the cell segmentation problem as finding the unit entities in the images. Though, they did not mentioned such details in their paper.

At first, I thought it potentially hurts the performance so I removed them. But there was no noticeable difference. This might be the images in the cellpose is only a small portion in our entire pretraining set, and the testing modalities in the challenge datasets does not contain such non-cell entities in the image.