[Bug]: reduce transformer vocab plugin overwrites best-model with final-model (which has worse F1 score)

zynos commented 1 month ago

Describe the bug

in Flair version 0.13.1 we use the reduce_training_vocab_flag and provide no test_data so flair prints out:

log.info("Test data not provided setting final score to 0")

In the file flair/trainers/plugins/functional/reduce_transformer_vocab.py you can see that the plugin only checks if best model is available, and if so, it overwrites it with the currently loaded model.

@TrainerPlugin.hook("after_training")
    def save_model_at_the_end(self, **kw):
        # saves the model with full vocab as checkpoints etc were created with reduced vocab.
        if self.disabled:
            return

        if (self.base_path / "best-model.pt").exists():
            self.model.save(self.base_path / "best-model.pt", checkpoint=self.save_optimizer_state)
        elif (self.base_path / "final-model.pt").exists():
            self.model.save(self.base_path / "final-model.pt", checkpoint=self.save_optimizer_state)

So if you find the best model in epoch 3 of 10 and you provide no test data, this code will overwrite it with a worse model from epoch 10.

To Reproduce

This should be reproducable with all Ner datasets with missing test set.


trainer.fine_tune(
        model_path,
        learning_rate=2.0e-5,
        mini_batch_size=MINI_BATCH_SIZE,
        train_with_dev=False,
        max_epochs=MAX_EPOCHS,
        # mini_batch_chunking costs extra GPU RAM but speeds up computation
        # if mini_batch_size >= 1
        mini_batch_chunk_size=1,
        use_final_model_for_eval=False,
        reduce_transformer_vocab=True

    )

Expected behavior

Either dont overwrite the best-model or load it also in case that there is no test set:

line 803 in flair/trainers/trainer.py

       else:
                self.return_values["test_score"] = 0
                log.info("Test data not provided setting final score to 0")

could be changed to

       else:
               if (base_path / "best-model.pt").exists():
                                log.info("Loading model from best epoch ...")
                                self.model.load_state_dict(self.model.load(base_path / "best-model.pt").state_dict())
                self.return_values["test_score"] = 0
                log.info("Test data not provided setting final score to 0")

Logs and Stack traces

No response

Screenshots

No response

Additional Context

No response

Environment

Versions:

Flair

0.13.1

Pytorch

2.3.0+cu121

Transformers

4.41.0

GPU

True

helpmefindaname commented 2 weeks ago

Hi @zynos Thanks for poining out this special case. The TransformerSmallerTrainingVocab needs to save the models again, as otherwise the vocabulary that was excluded from training would be missing later on. Beside that, it is inconsitend behaivour when the best-model is only loaded at the end iff you evaluate it on a test set. Hence I implemented your second suggestion here: https://github.com/flairNLP/flair/pull/3470

zynos commented 2 weeks ago

Thank you very much!

flairNLP / flair