dres-dev / DRES

Distributed Retrieval Evaluation Server
MIT License
14 stars 3 forks source link

Support for Perpetual Tasks #467

Open sauterl opened 3 months ago

sauterl commented 3 months ago

Based on the experience with #466 and its persumed intention, we should support perpetual tasks (e.g. tasks which are manually started and manually ended).

One solution could be to have the duration of the task optionally, which would result in perpetual tasks, if no duration is specified. There might be some regression using this approach in the way the viewer calculates which hint is shown when. The viewer should anyways only consider the time relative to the start of the task.

sauterl commented 1 month ago

Upon implementing, I realised that we did not (yet) fully discuss the semantics of perpetual tasks:

Interactive Syncrhonous Case

This is self-explanatory, as administrators have to end such perpetual tasks manually. The discussion regarding the consequences in the KisTaskScorer should be held, as this is duration based.

Interactive Asynchronous Case

I see multiple approaches::

  1. We simply do not allow for such a use case and upon building an asynchronous run, we throw an error, in case a perpetual task would be generated.

  2. We enable participants to move onwards (i.e. to the next task in the queue) with

    • 2.1. The possibility to always move on, regardless of the duration
    • 2.2. The possibility to only move on, if there would be a perpetual task.
  3. There is an administrator move-on functionality, to stage the progression

    • 3.1. Either syncrhonised for all participants
    • 3.2. Or asynchronous, for each participant indepentently.

One specific use case for this feature could be test tasks, as a preparation for an upcoming interactive synchronous evaluation. Considering this usecase, I ever so slightly lean towards 2.1, which would enable interactive asynchronous test runs, with, e.g. test tasks per task group and participants could move on.

This opens the question, however, whether in this case, a moving back (i.e. selecting the previous task) is a valid operation for a participant.

Noninteracive Case

We should consider this and ultimately, I guess the semantic should be similar to the asynchronous case, however DRES' noninteractive evaluation mechanisms should be tested and being fully supported first.

@lucaro @ppanopticon Thoughts?

lucaro commented 1 month ago

Interactive Synchronous Case

Agreed, this is straightforward.

Interactive Asynchronous Case

I would argue that, in this case, there should be an option for the participants to manually stop the task. A use case would be that you have to find an answer to a task without a time limit, but the time between the start of the task and the correct submission, which would then end the task, is still relevant for scoring. Ending the task would then be an active decision to give up and move on to the next task. There should not be an option to restart a task. Navigation options to possibly see results/scores are independent and could be considered. Taking the option of moving on to the next task away from participants defeats the purpose of the whole scheme, so I'm very much against this.

Noninteracive Case

This is also reasonably straightforward since all tasks will be comparatively long-running and active in parallel anyway. One thing we might want to consider here is that rather than a task duration, one would be able to specify an explicit duration for the evaluation or even an implicit duration for the evaluation by specifying the end time. A use case would be one in which you define a challenge with a submission deadline. In the template, you could then just ignore the task duration and set the end time whenever the evaluation is started.

sauterl commented 1 month ago

Thanks for your thoughts.

This more or less covers 2.1, as described above (I edited the formatting for better reading). I will implement this accordingly.

sauterl commented 1 month ago

Backend and frontend support is implemented and tested for a synchronous setting. Perpetual tasks are kept running over restarts of DRES (we might want to add a mechanism that, in case something is wrong with such a task, DRES is recoverable).

In doing so, I added a non-operational scorer, since at least the KIS scorer would break without a duration. I guess we could have a (offline) discussion on which scorers we should ship with in version 2.1

lucaro commented 1 month ago

I guess the KIS scorer would, in that case, implicitly assume that the task duration is infinite and, therefore, not apply any time penalty. It would then effectively be maxPoints - wrongSubmissionPenalty * wrongSubmissionCount.