As part of moving towards an automated process, it is generally easier if the evaluation process in each task is as similar as possible.
Therefore, it should be considered to standardize:
All python evaluation scripts to be evaluate.py
These scripts should use the same argument structure and shortforms, e.g. -g, -m/-p, -r.
Because
As part of moving towards an automated process, it is generally easier if the evaluation process in each task is as similar as possible. Therefore, it should be considered to standardize:
evaluate.py
-g
,-m
/-p
,-r
.Done when
No response
Additional context
No response