Unclear instructions on Human-in-the-Loop (HITL) training

valentin710 commented 1 year ago

Hi there,

I am new to cellpose, and I, so far, trained my models with manual annotations. Now I wanted to try HITL training. However, the instructions in your YouTube video aren't 100% clear to me. Isn't the idea to refine the model with each training iteration? In “cellpose 2.0: how to train your own cellular segmentation model” you state:

“… we go in this loop, where we're always correcting segmentation from a model that's constantly improving…”

Wouldn't that mean, that each training iteration would retrain the model of the previous iteration? Because in the tutorial video for HITL you always retrain the native 'cyto' model. Otherwise, where lies the performance difference between using manually segmented data for training, compared to HITL?

Thank you very much.

mrariden commented 1 year ago

This is described in more detail in the 2.0 paper, specifically Figure 4. Your understanding of HITL training sounds correct.

Can you point to the timestamp in the video that you're referring to? Maybe we can add a note in the video description.

valentin710 commented 1 year ago

This is described in more detail in the 2.0 paper, specifically Figure 4. Your understanding of HITL training sounds correct.

Okay, but how do I add my newly trained model to the “train new model with image+ masks in folder” → “initial model”- dropdown menu?

Can you point to the timestamp in the video that you're referring to? Maybe we can add a note in the video description.

In the “cellpose 2.0: how to train your own cellular segmentation model" Carsen explains the HITL training at 5:55 onwards.

As for the timestamp for the tutorial video by Marius Pachitariu: you can see him choosing 'cyto' as the initial model for his second training iteration at 7:47.

mrariden commented 1 year ago

My previous response was misleading/incorrect; HITL training always starts from a pretrained model but incorporates more training data with each loop. To start, only a single image is used as training data. After an iteration another image is included in the training data. Since there is more training data, the model should be more accurate on subsequent images.

If you always started from the previous/new model, it would be possible to drift away from good predictions with so little training data. Always starting with a pretrained model that is known to perform well makes the iteration cycle more robust. It also leverages the many (100+) training examples used to make the pretrained model.

The training instructions are included in the Models -> Training instructions drop downs. You do not need to add your new model to the 'train new model ...' window. I'll need to revisit the videos, but hopefully this description makes sense. Let me know if I missed anything.

valentin710 commented 1 year ago

Okay, I revisited the paper and realized that HITL is not about increased performance, but about making annotating less tedious and therefore accelerating the training process overall.

Thank you very much for your help.

b-faith commented 1 year ago

Hello!

Does this mean that if you are training over multiple days, and select the model you were training yesterday in the custom model drop down menu, and then resume training with more images (still in the same folder as the previous images/days) the model is always training from the initial pre-trained model or from the one I selected in the drop-down menu? (ie. am I weighting the images differently when I select my previous model to continue training?)

Let me know if I'm going about this properly!

mrariden commented 1 year ago

I'll put this in the FAQ since this is a topic of confusion. For the purpose of this post, a 'parameter' is a constant, or a number.

Consider the general description of a function; a function has an input, does some processing on the input, and produces an output. For a simple function like a line, there are only two parameters (slope and intercept). The neural net is an example of a much more complex function with ~millions of parameters (aka weights and biases). Still, the general definition of a function holds; provide an input (an image, here), processing occurs (the network does convolutions using the parameters), and an output is provided (refer to the intermediate representation in the paper, but for convenience just consider the output to be the segmentation).

Each model that comes with cellpose is a particular set of those ~millions of parameters, just like a particular line is defined by a slope and an intercept.

How does that interact with training? To train a model you must first have a 'guess' at what the parameters are. In the case of a line you can directly calculate the values of the parameters, but neural networks are so big that it's easier to do a kind of guess-and-check method described by the term 'training'. But, fitting method requires having known input/output pairs that constitute prediction 'truth'. The goal of training, then, is to adjust the parameters so that the function reproduces (predicts) something very close to the known/true output.

Training is sometimes a problem for deep learning because a naive approach would require many hundreds of training images to accurately train a model the size of cellpose's net. This naive approach would 'guess' the network parameters as random numbers. Since the real parameters will certainly not be random numbers, it will take a long time for the fitting procedure to land on parameters that accurately predict outputs given inputs. The approach cellpose takes is to start with weights that are somewhat close to predicting outputs. These starting points are the pretrained models. You also have the option to train from scratch (meaning, using random numbers), but in practice that usually takes longer and requires much more data.

So, models are defined by their parameters and those parameters can be used as starting points used in training rather than starting with random ones.

Cellpose aims to address ground truth, segmentation mask generation. You, the researcher know what good segmentation should look like, but the model does not. Since it is tedious to label a bunch of images, Cellpose was designed to facilitate image labeling to more quickly produce an accurate training dataset. You will start with a folder of unsegmented images. Next, you predict using a pretrained model on a few images and use the GUI to edit those segmentation masks. Because the pretrained model hasn't seen your exact data before, it will likely be a little off. After adjusting some of the masks, you train a new model starting from the one that you just used to predict the masks. Why? Well, it predicted the masks with some accuracy, so it must be somewhat close to your image data.

Now, to label your next image, you can use the newly trained model to predict the masks on the next image in your folder. Since this model has the benefit of being close (due to the pretraining) and seeing your data (due to the just executed finetuning) it should predict the masks in your new image slightly better than just the pretrained model. You use the new model to predict a few images and edit the resulting masks. Now you want to finetune again to make the model even better. You select the original model used for the original prediction, now training with more data that was just annotated by you. Why? From my post above:

If you always started from the previous/new model, it would be possible to drift away from good predictions with so little training data. Always starting with a pretrained model that is known to perform well makes the iteration cycle more robust. It also leverages the many (100+) training examples used to make the pretrained model.

So, training a new model will allow you to better annotate images, as the model is converging on the best parameters/weights to segment your images. You should use the new models to predict the segmentation, that is the point of the HITL design. Eventually, you will have a model that doesn't need additional training to accurately predict your segmentation.

For your question specifically, @b-faith

Does this mean that if you are training over multiple days, and select the model you were training yesterday in the custom model drop down menu, and then resume training with more images (still in the same folder as the previous images/days) the model is always training from the initial pre-trained model ...

The model is always training from the pre-trained model that you choose in the training window. The model you click in the main GUI window and then use "run model" with is for mask prediction. You always want that one to be the 'best' model so you don't waste time annotating.

tldr; Regardless of the specifics of how everything works, Cellpose is to help you segment images. If you're happy with your segmentation and it doesn't take forever, you are doing it right. 😃

clemyyyy commented 5 months ago

Is it possible to implement iterative training on multiple images with a human-in-the-loop feature that activates after processing a few images? This would allow the automated learning and training to run overnight, with the images being reviewed in the morning with the HITL feature. Is a feature like this already available?

mrariden commented 5 months ago

@clemyyyy Yes, this is available via the CLI. I would run a pretrained CP model on a folder of your images overnight (using the CLI). The next day you can open them in the GUI and annotate them individually. Then train overnight with the new annotations.

You can pick how many images to process per day.

Feel free to open a new issue if you have a more specific question.

MouseLand / cellpose

Unclear instructions on Human-in-the-Loop (HITL) training #723