Conflict with status annotation

andrewelamb commented 5 years ago

Evaluation queues use the "status" annotation to determine which submissions count toward the submission limit. If the status is "INVALID", it doesn't count toward this limit. In the past this status referred to whether or not the submission got through the validation process or not. Using the workflow hook, it now means whether or not the workflow that the hook is using completed or not, whereas the current workflow challenge template uses the prediction file status to track if the file is invalid.

Possible solutions:

Challenge workflows error out during the validation step, if the submission is not valid making the status "INVALID".
The workflow hook uses a different annotation then status, allowing the workflow to manage the state of the status annotation.
The evaluation queue changes so that it uses a different annotation(or more than one) to mange queue limits.
Implement a process outside the workflow hook that checks for submissions that completed the workflow, but are invalid and sets the status to "INVALID".

thomasyu888 commented 5 years ago

+1

brucehoff commented 5 years ago

Challenge workflows error out during the validation step, if the submission is not valid making the status "INVALID".

This is the option supported by the current hook. Your workflow would finish doing whatever it needs to (e.g., sending notifications, setting annotations to be displayed in the leaderboard) and then, finally, raise an exception. The workflow hook would register the exception, set the submission status to INVALID, and skip counting it towards the users' submission quota. Does this meet your needs?

thomasyu888 commented 5 years ago

The issue with this approach is that theres no granular enough of logging from CWL and TOIL. For instance, what we really care about is for participants to obtain logs from ONLY the validation and scoring step. Unfortunately, I don't think CWL offers this right now. We would have to write a tool to parse through the logs which is more complex than the solution below.

Our current workaround is to create a tool that looks for submissions that have ACCEPTED submission status but invalid prediction file and changes the status of these submissions from ACCEPTED to INVALID https://github.com/Sage-Bionetworks/challengeutils/pull/51

The real question is, if there are specific synapse usages to the submission status, should we really make the hook depend on them as well?

brucehoff commented 5 years ago

theres no granular enough of logging from CWL and TOIL. For instance, what we really care about is for participants to obtain logs from ONLY the validation and scoring step.

If the participant should not see the entire Toil log then it seems to me that you should create your own, custom "log" (detailed output of validation and of scoring) to return to the participant.

We would have to write a tool to parse through the logs which is more complex than the solution below.

It seems to me that the best approach is not to add your output to Toil's log and the parse out again but rather to keep that content separate (e.g. write it to a file) and then send it in a notification or upload to Synapse and send a link to the file in a notification.

Our current workaround is ...

Why not raise an exception at the end of your workflow? You would do this once you've done everything else you need to do (e.g., notify participants that their submission failed validation and sent them the details of the failure).

if there are specific synapse usages to the submission status

@thomasyu888 reminds me that Synapse's enforcement of quotas/submission eligibility is based on the status.

Sage-Bionetworks / SynapseWorkflowHook

Conflict with status annotation #44