Metrics - Githubissues

We want to be able to monitor as much different metrics as we can, ideally automatically generated into weights and biases, in order to analyse the model performance. Those that are easily estimated can be computed say every epoch (or even a bit more often as our epochs will be large), others just at the end of a training run.

Some ideas:

confusion matrix (matrix of predicted classes vs label classes)
accuracy
- globally
- per sample type
- per day
- per height
- per class
confidence per class
- i.e. among the pixels where the largest probability is water, what is the average probability of water?
images
- one or some slices of the test set, with the experimental data, and both the predictions and Floriana's labels overlaid, if we can make that readable

There are also metrics used in the literature specifically for image segmentation, we should look into that too.

UNSAT3D / unsat

Metrics #5