Closed robinholzi closed 1 month ago
i know this is not ready for review yet but just as a general note: i think stuff like failure recovery and parallelization should be implemented at the evaluator component, not the supervisor. so at some point we'll probably delete the threadpool from the supervisor and just send a couple of requests and the evaluator takes care of correctly handling them
partially superseded by https://github.com/eth-easl/modyn/pull/490
Motivation
We need parallelization for evaluations in the pipeline executor. Also, we want to implement the post pipeline evaluation.
Changes
To support that we move the evaluation-related code to the dedicated class
EvaluationExecutor
which handles both the evaluations after and during the pipeline executor. It also supports restarting evaluations after the training pipeline is completed.