Open Thytu opened 4 months ago
One way to do that would be to add a custom TrainerCallback
that will evaluate the target metric (i.e loss) for each modality after each evaluation loop using TrainerCallback.on_evaluate.
What I like
transformers
' source codeWhat I don't like
Maybe there is a way using TrainerCallback.on_prediction or TrainerCallback.on_prediction_step to directly add the metric based on the data used for the prediction. This would avoid running an inference twice.
Trainer.compute_metrics seems to solves exactly that needs, the issue is that I observed spikes in VRAM usages when using this parameter. Must be investigated.
As mentioned earlier, utilizing Trainer.compute_metrics appears to be the optimal approach, as it facilitates the seamless integration of metrics. In this case, I intend to incorporate Word Error Rate and ROUGE.
However, I've identified two issues with Trainer.compute_metrics:
cuda:0
), leading to sudden spikes in VRAM usage.I will consider the best strategies to address these issues and will probably submit a PR to the transformers repository.
Currently the only metrics available during evaluation is the model loss however this does not provide enough granularity about the model performance on each of the modality its training on.
As a user I wand to be able to know if my model starts to perform poorly on a modality (i.e its accuracy deteriorates on text instruct tasks).
The
Trainer
should generate multiple evaluation plots, at least one per modality.