Closed pjbull closed 2 years ago
This comes from this line: https://github.com/drivendataorg/zamba/blob/master/zamba/models/model_manager.py#L178
Previously we sorted the columns in place. To avoid that sorting assumption, we set the col order explicitly. But here we're assigning to a new labels object that never gets used. At this point in the code when we're adding more columns, we are working directly with the train_config.labels
object, which is admittedly a fragile approach. Not sure the right fix yet.
The error is useful in that it's right -- our dataloaders and model have different orders and that would yield poor results.
I think the fix is to do this when we're setting up labels in the configs. This is also conceptually clearer to not be doing any labels modification in instantiate_model
. We can set the labels column as a categorical before we one hot encode, and set the categories to be all the models on the species if we're using the default model labels.
validate_species
fails if order is different. Reproduced by training a model on a subset of labels (aardvark
andblank
).Passed config
Output logs before error
Error message