deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

Should be possible to use the proper aggregated loss for early stopping #844

Closed johann-petrak closed 2 years ago

johann-petrak commented 2 years ago

Currently there is no easy way to use the aggregated loss as used for optimization also for early stopping.

It is possible to define a function for the ES metric which can aggregate the per-head losses from the devset evaluation, but for this, the global step and batch number parameters cannot be provided because they do not get passed to the check_stopping method. Also, that function would get applied to the already accumulated losses per head. If the accumulation function is not linear, then the accfun of the accumulated losses would not be identical to the accumulated accfun of the losses.

Not sure how to best make this work.

One option could be:

Timoeller commented 2 years ago

Hey @johann-petrak as usual we would be happy about your contributions but please be patient regarding getting our feedback, we are currently very busy with topics related to Haystack. If the solution is very encapsulated I could see it being reviewed and merged rather quickly though : )

johann-petrak commented 2 years ago

I have a solution for this and quite a number of other things in some code I copy-pasted because I needed to use it quite quickly for myself. I can definitely provide a PR for this after I have completed the work here for my own deadlines in the next couple of weeks. The implementation would basically just use a slightly modified evaluator class, also inside the train method, and a predefined metric "aggregated-loss" or similar in addition to "loss" which gets copied into the evaluation results for all heads (since the return value is a list of per-head result dicts).

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.