Closed gabrieltseng closed 4 months ago
balancing the samples in the batch, removing the need for a class weight entirely without messing about with the loss's magnitude: https://github.com/WorldCereal/presto-worldcereal/commit/3cf5e9531bb0efab85eb97240b64e1ca533ee598
This looks really promising IMO. It's how we trained crop type models for the operational European crop mapping project. I honestly didn't know the balancing was also done for finetuning Presto itself. I thought it was only relevant for the downstream classifiers like CatBoost. Because in the end, I think we don't want a separate Presto encoder for each crop type, but finetune in general for crop type mapping (and maybe from crop/no crop we learned already enough?). Could we also try just to take finetuned Presto crop/no-crop and balance only in the sklearn and CatBoost models? (or in a head but then by doing batch balancing)
I think there is still work to do wrt. CatBoost parameters, but I am going to merge this in for now since this class balancing seems to work.
compute_class_weight
for both the sklearn models and catboostcompute_class_weight
function here tooTL;DR
Variations in performance for the finetuned model don't seem to trickle down to the sklearn models, so any choice would be okay. Balanced sampling seems fine to go with for now, so this is what we currently have (and it is very marginally better).
Class weights
Results for https://github.com/WorldCereal/presto-worldcereal/pull/46/commits/cb55f93554ec30d1700754d5ecc142ff29c036bb below (wandb run):
Interestingly, fixing the finetuning weights :bug: seems to lead to much worse finetuning results for the maize model (F1: 0.6711 vs. 0.8081 before). This seems to also impact the sklearn models trained on top of this model (except for the random forest and catboost models).
For maize, the imbalance (and therefore the
pos_weight
) is very high:Having such a high weight seems bad to me? I think three solutions would be:
Clamped class weights
Results from the experiment with clamped class weights at finetuning time. Better (maize) finetuning results but actually not a huge difference for the sklearn models.
Balanced sampling
Results from the experiment with balanced sampling at finetuning time.