Split metrics by modality during evaluation

Thytu / SMIT

SMIT: A Simple Modality Integration Tool

MIT License

16 stars 3 forks source link

Split metrics by modality during evaluation #6

Open Thytu opened 4 months ago

Thytu commented 4 months ago

Currently the only metrics available during evaluation is the model loss however this does not provide enough granularity about the model performance on each of the modality its training on.

As a user I wand to be able to know if my model starts to perform poorly on a modality (i.e its accuracy deteriorates on text instruct tasks).

The Trainer should generate multiple evaluation plots, at least one per modality.

Thytu commented 4 months ago

One way to do that would be to add a custom TrainerCallback that will evaluate the target metric (i.e loss) for each modality after each evaluation loop using TrainerCallback.on_evaluate.

What I like

It's easy to implement and would result in a code easy to read
It doesn't require any tweak in transformers' source code

What I don't like

This would require running an inference over the dataloader a second time thus doubling the evaluation time

Maybe there is a way using TrainerCallback.on_prediction or TrainerCallback.on_prediction_step to directly add the metric based on the data used for the prediction. This would avoid running an inference twice.

Thytu commented 4 months ago

Trainer.compute_metrics seems to solves exactly that needs, the issue is that I observed spikes in VRAM usages when using this parameter. Must be investigated.

Thytu commented 4 months ago

As mentioned earlier, utilizing Trainer.compute_metrics appears to be the optimal approach, as it facilitates the seamless integration of metrics. In this case, I intend to incorporate Word Error Rate and ROUGE.

However, I've identified two issues with Trainer.compute_metrics:

nested_concat: The current implementation within transformers assumes that logits remain of a constant shape, thereby enforcing padding to match the model's maximum length.
VRAM Spikes: It seems that the concatenated logits resulting from the operation are stored on the same device as the output logits (here cuda:0), leading to sudden spikes in VRAM usage.

I will consider the best strategies to address these issues and will probably submit a PR to the transformers repository.