BoomStarcuc / GeneSegNet

MIT License
14 stars 2 forks source link

Training on my own data #2

Open pakiessling opened 10 months ago

pakiessling commented 10 months ago

Congratulations on the publication!

I am interested in trying it out with my own data and I have some questions:

  1. Is the image size 256 x 256 the one you recommend?
  2. Can one also use cell border staining or is nuclei staining prefered?
  3. How exactly does my input need to be formated? Just Spot X,Y, ground truth and staining images?
  4. Any recommendation on generating the ground truth masks? Did you just use another segmentation algorithm for this?

Thank you!

BoomStarcuc commented 10 months ago

Hi

Thank you so much for your kind words and interest in our work!

To address your questions:

  1. Yes, we recommend an image size of 256 x 256 as this is the dimension we conducted all experiments. However, you might experiment with different sizes depending on the specifics of your data. To generate different image sizes, please modify the following preprocess codes: https://github.com/BoomStarcuc/GeneSegNet/blob/d8ce4302fc5da880b6322e2d0ac821ad7fa58f5b/preprocess/Generate_Image_Label_locationMap.py#L167-L176

  2. We primarily used nuclei staining in our experiments as we aimed to predict cell boundaries using nuclei staining images combined with gene expressions as input. Although not explicitly tested in our study, our model is adaptable and can process various types of images, such as cell border staining or a combination of nuclei and cell border staining images. It's essential, however, that the image adheres to a standard image format. Furthermore, if you opt for different types of inputs, you might need to adjust the 'chan' parameter within the network to match the number of your input channels. For more details, please refer to the "chan" parameter in the “Training from scratch” section.

  3. Please see the Data preprocess section to process your own data. Yes, initially all you need as input data are gene expressions (Spot x, Spot y, and gene), staining image, and training annotations. Here, if you do not have the gene info, please ignore the info since the gene info only is used on RNA-based methods, such as Baysor and JSTA.

  4. For the datasets we used in our paper, the annotations in the Hippocampus and NSCLC datasets are downloaded from their official website and the annotations in the simulation dataset are generated by a toolbox (see link: https://cbia.fi.muni.cz/simulator/). If you want to generate annotations in your real data, I have followed several recommendations: (1) Completely manual annotation. While it is time-consuming, manual annotation can get good results. (2) Combination of pre-trained models of existing cell segmentation algorithms and manual correction. First, run all your images on the pre-trained model, and then manually correct incorrect cell segmentations.