KerenLab / CellSighter

28 stars 5 forks source link

CellSighter

CellSighter is an ensemble of convolutional neural networks to perform supervised cell classification in multiplexed images. Given a labeled training set, a model can be trained to predict cell classes for new images.

Run on python 3.8.5

Data Preparation

The Data should have the following structure:

Notes:

System requirements

  1. Access to GPU
  2. See Requirements file for libraries and versions.

Training a Model

  1. Prepare the data in the format above
  2. Create a folder with the configuration file named "config.json". See "Preparing configuration file" for more information.
  3. Train one model with the following command: 'python train.py --base_path=/path/to/your/folder/with/the/configuration file'
  4. In order to run an ensemble, run the command above more than one time in multiple folders.

Output files:

  1. valresults{epocNum}.csv - Results on validation set along training. The file contains the following columns:
    pred - prediction label
    pred_prob - probability of predicting the label
    label - input label to the training
    cell_id - cell_id
    image_id - image_id
    prob_list - list of probabilities per cell type. The index is the cell type.
  2. Weights_{epocNum}_count.pth - The weights of the network.
  3. event.out.### - tensorboard logs

Evaluating the model

  1. Prepare the data in the format above
  2. Create a folder with the configuration file named "config.json". See "Preparing configuration file" for more information.
  3. Change the "weight_toeval" field in the config file to be the path to the weights of the model you trained (Weights{epocNum}_count.pth).
  4. Evaluate one model with the following command: 'python eval.py --base_path=/path/to/your/folder/with/the/configuration file'
  5. You should now have a results csv in the folder.
  6. In order to run an ensemble just run the command above for each model you trained. Make sure to change the weight paths and work on multiple folders one for each model. You should now have multiple results files. You can combine them as you wish, or use the merging scripts supplied.

Output file:

  1. val_results - same format as training
  2. event.out.### - tensorboard logs

Analyze results

Preparing configuration file

The configuration file should be named 'config.json' and should have the following fields:

"crop_input_size": 60, #size of crop that goes into the network. Make sure that it is sufficient to visualize a cell and a fraction of its immediate neighbors.
"crop_size": 128, #size of initial crop before augmentations. This should be ~2-fold the size of the input crop to allow augmentations such as shifts and rotations.
"root_dir": "data_path", #path to the data that you've prepared in previous steps
"train_set": ["FOV1", "FOV2", ...], #List of image ids to use as training set
"val_set": ["FOV10", "FOV12", ...], #List of image ids to use as validation/evaluation set
"num_classes": 20, #Number of classes in the data set
"epoch_max": 50, #Number of epochs to train
"lr": 0.001, # learning rate value
"to_pad": false, #Whether to work on the border of the image or not
"blacklist": [], #channels to not use in the training/validation at all
"channels_path": "", #Path to the protein list that you created during data preparation
"weight_to_eval": "", #Path to weights, relevant only for evaluation
"sample_batch": true, #Whether to sample equally from the category in each batch during training
"hierarchy_match": {"0": "B cell", "1": "Myeloid",...} #Dictionary of matching classes to higher category for balancing higher categories during training. The keys should be the label ids and the values the higher categories.
"size_data": 1000, #Optional, for each cell type sample size_data samples or less if there aren't enough cells from the cell type
"aug": true #Optional, whether to apply augmentations or not