We want to be able to monitor as much different metrics as we can, ideally automatically generated into weights and biases, in order to analyse the model performance.
Those that are easily estimated can be computed say every epoch (or even a bit more often as our epochs will be large), others just at the end of a training run.
Some ideas:
confusion matrix (matrix of predicted classes vs label classes)
accuracy
globally
per sample type
per day
per height
per class
confidence per class
i.e. among the pixels where the largest probability is water, what is the average probability of water?
images
one or some slices of the test set, with the experimental data, and both the predictions and Floriana's labels overlaid, if we can make that readable
There are also metrics used in the literature specifically for image segmentation, we should look into that too.
We want to be able to monitor as much different metrics as we can, ideally automatically generated into weights and biases, in order to analyse the model performance. Those that are easily estimated can be computed say every epoch (or even a bit more often as our epochs will be large), others just at the end of a training run.
Some ideas:
There are also metrics used in the literature specifically for image segmentation, we should look into that too.