Handle the case for a new feature introduce by developer

A developer might introduce a new feature in the code that is triggered by a new namelist option or a new value for an existing option. We need to be able to use benchcab to run the analysis. The problem is the control branch does not have this new feature.

One solution could be to allow the user to specify one namelist option-value pair in the config file. Then, benchcab could create the following tasks:

the standard tasks for the control branch
the standard tasks for the dev branch
the standard tasks for the dev branch with the new option-value pair added to all the tests.

In the third case, we would have to ensure the new option-value pair is added if it's a new option. But it needs to replace a current option-value pair if it adds a value to an existing option. A possible downside is if the standard tests already run for several configurations with different values of that option, all these configurations will then run with the same option and be redundant.

For the control branch, we could either:

re-run the set of standard tasks but use the same science config identifiers as for the 3rd set of tasks.
copy the output to the set of science config identifiers for the 3rd set of tasks

Example

Let's assume the initial set of science configurations is:

sci0: {
    cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

sci1: {
    cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

Example with a new feature that is a new namelist option

(The changes don't have to be in the config file for the science configs, it's just used here for a representation) For the dev branch, we will want to run the following science configurations for example:

sci0: {
    cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

sci1: {
    cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013"
    }
}
sci2: {
    cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013",
        myfeat:.TRUE.
    }
}

sci3: {
    cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013",
        myfeat:.TRUE.
    }
}

For the control branch, we have:

sci0: {
    cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

sci1: {
    cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013"
    }
}
sci2: {
    cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

sci3: {
    cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

So sci2 and sci3 are copies of sci0 and sci1 respectively.

Example with a new feature that is a new value for an existing option

For the dev branch, we will want to run the following science configurations for example:

sci0: {
    cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

sci1: {
    cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013"
    }
}
sci2: {
    cable_user: {
        GS_SWITCH: "my_new_scheme",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

sci3: {
    cable_user: {
        GS_SWITCH: "my_new_scheme",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

In this case, sci2 and sci3 are identical.

For the control branch, we have:

sci0: {
    cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

sci1: {
    cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013"
    }
}
sci2: {
    cable_user: {
        GS_SWITCH: "medlyn",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

sci3: {
    cable_user: {
        GS_SWITCH: "leuning",
        FWSOIL_SWITCH: "Haverd2013"
    }
}

Again, that's a straight copy of the initial tests.

I think I understand where you are coming from. Are you suggesting we do some sort of regression test between control and dev (when the new feature is turned off in the dev branch) and a scientific comparison between control and dev (when the new feature is turned on in the dev branch)?

As for enabling the new feature in the dev branch, I guess initially both control and dev branches include the default science configurations in their respective namelist files, and then we add some patch to the namelist file for the dev branch that turns the new feature on?

For the control branch, we could either:

re-run the set of standard tasks but use the same science config identifiers as for the 3rd set of tasks.

copy the output to the set of science config identifiers for the 3rd set of tasks

I think copying the output is ideal (since the tasks should be identical).

This also makes me think if we should have two modes of running benchcab:

Does a scientific comparison between the two branches with the same science configs for both control and dev
Does a regression + scientific comparison between the two branches:
1. Where the regression test uses the same science configs for both control and dev
2. Where the scientific comparison uses the same configs for both control and dev but with some patch applied to the dev specific science config.

After discussion, we decided to do the following.

Add a `regression` flag in the config flag

It's a True/False flag. If True, the dev branch is run with the default value for the new feature. i.e. it is not given in the namelist file. It is assumed the default value means the feature is turned off.

Add the possibility to list a new feature in the config file

If regression is False and a new namelist option (or new value to existing option) is given in the config file, we run a scientific eval run:

run control with the standard experiments
run dev with the standard experiments with the new option turned on in all the experiments.

We want to identify these scientific evaluation outputs: put a new part in the file names (_NewScience?) of the dev branch outputs

Just as an after thought, I think it would be better if we do both a regression test and a scientific evaluation every time (regardless of whether a new feature is turned on or off). For example, after we have our model output we do the following:

do regression test
if regression test failed:
    do scientific evaluation

These are my reasons:

This would remove the need for an extra flag in the config file and the need to identify whether we do a scientific evaluation or a regression test from the output file names.
This could act as a good sanity check for the developer so that they can see whether or not a new feature is effecting the output, or equally, if disabling some namelist patch for the dev branch gets the regression tests to pass.
This would avoid developers running the exact same configurations again when they want to swap from doing a regression test to a scientific evaluation as both use the same output.

I am not sure I follow your idea.

For example, after we have our model output we do the following:
do regression test
if regression test failed:
    do scientific evaluation
I don't get why the scientific eval is done only if the regression test fails.

Are you saying we would have the outputs for everything: regression and scientific eval? Then, the analysis script should first try to run a regression test and if it fails then use that data for a scientific evaluation? For the moment, the analysis script is the same on all the outputs. So the "regression" output is evaluated scientifically. That's probably not ideal but it should clearly show that both outputs are identical.

(It makes me think, I will add an issue to add a bitwise comparison analysis in benchcab to run on the output for a quick regression test. I think that's something people will want to have.)

CABLE-LSM / benchcab