Closed ccarouge closed 1 year ago
I think I understand where you are coming from. Are you suggesting we do some sort of regression test between control and dev (when the new feature is turned off in the dev branch) and a scientific comparison between control and dev (when the new feature is turned on in the dev branch)?
As for enabling the new feature in the dev branch, I guess initially both control and dev branches include the default science configurations in their respective namelist files, and then we add some patch to the namelist file for the dev branch that turns the new feature on?
For the control branch, we could either:
- re-run the set of standard tasks but use the same science config identifiers as for the 3rd set of tasks.
- copy the output to the set of science config identifiers for the 3rd set of tasks
I think copying the output is ideal (since the tasks should be identical).
This also makes me think if we should have two modes of running benchcab
:
After discussion, we decided to do the following.
regression
flag in the config flagIt's a True/False flag. If True, the dev
branch is run with the default value for the new feature. i.e. it is not given in the namelist file. It is assumed the default value means the feature is turned off.
If regression
is False and a new namelist option (or new value to existing option) is given in the config file, we run a scientific eval run:
control
with the standard experimentsdev
with the standard experiments with the new option turned on in all the experiments.We want to identify these scientific evaluation outputs: put a new part in the file names (_NewScience
?) of the dev
branch outputs
Just as an after thought, I think it would be better if we do both a regression test and a scientific evaluation every time (regardless of whether a new feature is turned on or off). For example, after we have our model output we do the following:
do regression test
if regression test failed:
do scientific evaluation
These are my reasons:
I am not sure I follow your idea.
For example, after we have our model output we do the following:
do regression test if regression test failed: do scientific evaluation
I don't get why the scientific eval is done only if the regression test fails.
Are you saying we would have the outputs for everything: regression and scientific eval? Then, the analysis script should first try to run a regression test and if it fails then use that data for a scientific evaluation? For the moment, the analysis script is the same on all the outputs. So the "regression" output is evaluated scientifically. That's probably not ideal but it should clearly show that both outputs are identical.
(It makes me think, I will add an issue to add a bitwise comparison analysis in benchcab to run on the output for a quick regression test. I think that's something people will want to have.)
A developer might introduce a new feature in the code that is triggered by a new namelist option or a new value for an existing option. We need to be able to use
benchcab
to run the analysis. The problem is the control branch does not have this new feature.One solution could be to allow the user to specify one namelist option-value pair in the config file. Then,
benchcab
could create the following tasks:In the third case, we would have to ensure the new option-value pair is added if it's a new option. But it needs to replace a current option-value pair if it adds a value to an existing option. A possible downside is if the standard tests already run for several configurations with different values of that option, all these configurations will then run with the same option and be redundant.
For the control branch, we could either:
Example
Let's assume the initial set of science configurations is:
Example with a new feature that is a new namelist option
(The changes don't have to be in the config file for the science configs, it's just used here for a representation) For the dev branch, we will want to run the following science configurations for example:
For the control branch, we have:
So
sci2
andsci3
are copies ofsci0
andsci1
respectively.Example with a new feature that is a new value for an existing option
For the dev branch, we will want to run the following science configurations for example:
In this case,
sci2
andsci3
are identical.For the control branch, we have:
Again, that's a straight copy of the initial tests.