Rework train-val-test split

[x] use 6 instead of 5 folds
[x] use a separate fold for testing (don't use the validation fold for this)
- [x] stats per fold
- [x] Dataset param: fold_indicies (tuple, list)
- [x] stats per fold aggregieren
[x] Create stratified folds
- [x] use the following heuristic:
- [x] create a metric that describes the equality of the distribution: a talk with chatGPT about possible metrics and the prevention of overfitting
- [x] init each fold with one chip
- [x] for each chip add it to each fold and compute the new metric
- [x] add the chip to the fold that created the best score
- [x] repeat until all chips are distributed
- [ ] ~additions~
  - [x] add a penalty against an unbalanced number of chips in each fold
  - [x] use log distribution instead of normal pixel count
  - [ ] add a penalty against an unbalanced number of pixels (without background) in each fold
  - [ ] add more regularization techniques against overfitting
- [x] for speed up: aggregate stats for each fold and just +/- the stats for the to be added/removed chip

Satellite-Based-Crop-Classification / messis