Closed marcverhagen closed 11 months ago
This was fixed in https://github.com/clamsproject/app-swt-detection/commit/71b57a224438969aa1116a25ef6d8b0a103d6a1f by using the labels field from the model config file. An alternative would be to change
bins:
pre:
slate:
- "S"
chyron:
- "I"
- "N"
- "Y"
credit:
- "C"
into
bins:
pre:
- slate:
- "S"
- chyron:
- "I"
- "N"
- "Y"
- credit:
- "C"
and update train.get_final_label_names()
.
This is probably because of this; https://github.com/yaml/pyyaml/issues/110.
Trainer config: https://github.com/clamsproject/app-swt-detection/blob/f033551e1e53926fac6b8a03095980402ca2d06a/modeling/config/trainer.yml#L43-L52
After yaml.dump
-ed (exported config), keys are sorted:
https://github.com/clamsproject/app-swt-detection/blob/f033551e1e53926fac6b8a03095980402ca2d06a/modeling/models/20231214-193543.convnext_tiny.kfold_config.yml#L1-L10
Considering added label mapping complexity when we add postbin
handling in a very near future, I don't think classifier relying on the labels
values from the model_config is a good idea, simply because that value is the "final labels" after pre+post binning. To that end, I think we should remove the labels
key all together from export process to eliminate a big source of confusion.
https://github.com/clamsproject/app-swt-detection/blob/f033551e1e53926fac6b8a03095980402ca2d06a/modeling/train.py#L243-L244
To fix the original wrong association issue, just adding sort_keys=False
argument to yaml.dump
call should be enough.
Agreed on both counts. I hadn't looked closely at the new export code you had added. And indeed having both bins
and labels
in the model configuration is redundant. I am doing some cleanup of the code now and will include those fixes.
Because
Somewhere along the line we started assigning the wrong label, probably because the way we dealt with configurations has changed several times.
The config file
modeling/config/trainer.yml
has:And the classifier config in
/modeling/models/20231214-193543.convnext_tiny.kfold_config.yml
hasBut the method
train.get_final_label_names()
returnsand since that method is eventually used to set labels on an instance of Prediction we get a mismatch.