Consolidating validation and scoring

philerooski commented 5 years ago

This idea comes from @kdaily but it makes a lot of sense to me.

Rather than trying to emulate the old challenge pipeline where the validation and scoring steps are two distinct scripts, there would be a few advantages to combining them into a single step:

With respect to modularity, it's best if our scoring script doesn't assume that all input it receives is valid. If someone wants to run the scoring script as a stand-alone program, the script should check for valid input and return an appropriate result if the input is invalid, rather than throwing an error.
Because of the previous point, writing a scoring script using best standards (checking for input validity) means repeating logic between validation and scoring scripts.
If we have a non-trivial scoring step, it will be containerized. We'll want to invoke this container directly. but we don't want to score if the submission is invalid, and CWL doesn't support conditionals (yet), so it's necessary to either wrap the container in another script which handles the conditional, or pass the status to the container. The first case is inconvenient because the workflow engine won't automatically handle the I/O of the container for us and the second case is awkward when using the scoring script as a stand-alone program.

One disadvantage of combining the steps is that we lose the independence between validation and scoring. If two challenge (or two subchallenges) take and validate the same input, but then generate a score in different ways, we need to repeat the validation logic between the scripts. But the severity of this disadvantage is relative to how often such a scenario arises in challenges. From my understanding, this occurs rarely or never.

thomasyu888 commented 5 years ago

@kdaily and @philerooski

The beauty of workflows is that you can definitely combine the two if you would like to for your challenge. I definitely have had challenges that combined those two, or only had a validation step due to simplicity of scoring and so on... However, view my comments below for why I don't think it is a good idea to officially do this.

Often times, the scoring function is not made public to the participants but the validation script is to allow participants to validate their prediction file prior to upload. To respond to your point 2, the way that I would recommend is actually to just source the validation script in your scoring code and have a 'skip_validation' flag so that validation only occurs once during the challenge, but when you're writing the scoring code, you are calling the validation script.
With regard to your point 3, I do pass along the conditional check and there are challenges that have complex scoring which would mean that participants would have to wait until scoring is completely done to obtain any information, unless you decide to code in 'validation', 'updating status', 'notify participants to validation status', 'score' all in one tool. Im not sure that is the way to go.
There are a couple challenges I can think of that in fact use the same validation script across subchallenges, but have different scoring functions.

All in all, I don't think there is necessarily a 'right' or 'wrong' way. I do think that once conditionals are allowed in CWL, it would make life a lot easier. That being said, feel free to consolidate validation and scoring for your challenge if it makes sense for your use case.

kdaily commented 5 years ago

Validation logic should occur where it's required and before long running computation (fail fast philosophy). Scoring functions (or, any functions) should perform the necessary checks to make sure the input if valid so that they can be run alone without a workflow and be reusable and extendable in other contexts.

In the context of multi-step workflows, it makes sense to separate validation logic for multi-step processes that depend separately on initial inputs: e.g. if you pass in file1 and file2, step A uses file1 and step B uses the output of step A and file2. You would want to validate both file1 and file2 prior to running anything - allowing step A to run when file2 is invalid would be imprudent, UNLESS we have a graceful, user-friendly strategy for retrying specific steps.

If those errors need to be reported back to a user through another system other than the one provided directly by the program (e.g., through a Synapse-generated email) then the workflow (as a wrapper) should handle that.

We should come to a consensus on "how" errors are reported, and determine best practices for "when" they should be reported.

kdaily commented 5 years ago

To respond to a few specifics:

Often times, the scoring function is not made public to the participants but the validation script is to allow participants to validate their prediction file prior to upload.

The implementation of the scoring function is often not made public, but the execution is. That's business logic that should be handled through security (e.g., private repositories, private containers) not implementation of the code itself.

To respond to your point 2, the way that I would recommend is actually to just source the validation script in your scoring code and have a 'skip_validation' flag so that validation only occurs once during the challenge

This is mixing business logic (how to construct a workflow) with the code and makes it difficult to reuse. I have to strongly disagree with this strategy. Wrapping the code as needed with the workflow is the correct approach.

With regard to your point 3, I do pass along the conditional check and there are challenges that have complex scoring which would mean that participants would have to wait until scoring is completely done to obtain any information, unless you decide to code in 'validation', 'updating status', 'notify participants to validation status', 'score' all in one tool. Im not sure that is the way to go.

Agree in principle (as noted in response above) but all of this again is business logic related to workflows, not to the running of the 'scoring' scripts themselves.

All in all, I don't think there is necessarily a 'right' or 'wrong' way.

Disagree. We need to build consensus around the ways to do it, and decide which is the way we will use, regardless if there are multiple ways to do it. If there are legitimate needs for separate ways, then they should be enumerated as possible.

philerooski commented 5 years ago

To respond to your point 2, the way that I would recommend is actually to just source the validation script in your scoring code and have a 'skip_validation' flag so that validation only occurs once during the challenge

This is mixing business logic (how to construct a workflow) with the code and makes it difficult to reuse. I have to strongly disagree with this strategy. Wrapping the code as needed with the workflow is the correct approach.

I'm actually okay with this under certain circumstances -- but with a modification. If the standard model is to source the validation script within the scoring script, let's not implement the --skip_validation flag and just run validation twice within the workflow. There will be redundant processing, but I'm more concerned with redundant development work. Though, if the validation step is time consuming, then this model could pose some challenges.

The nice thing about doing things this way is that we can have our cake and eat it, too. We can choose whether we want to use the current workflow model validate -> annotate -> email -> score -> annotate -> email, or the more elegant model validate/score -> annotate -> email

kdaily commented 5 years ago

just run validation twice within the workflow

Could get behind that too!

Sage-Bionetworks / ChallengeWorkflowTemplates

Consolidating validation and scoring #17