Stages "input" and "forecast" include perturbations

kdraeder commented 1 year ago

Describe the bug

List the steps someone needs to take to reproduce the bug.

Make a copy of the initial ensemble for reference.

Run an assimilation with

perturb_from_single_instance = .true.
stages_to_write = 'input' andor 'forecast'

Compare the stage file with the reference file and note that they differ by whatever perturbations were requested in filter.

What was the expected outcome? Writing stage input andor forecast should archive the ensemble of states that resulted from the model hindcast. https://docs.dart.ucar.edu/en/latest/guide/controlling-files-output.html describes it as "the ensemble forecast" with no mention of perturbations. The 'input' stage is not described there for the case where single_file_out = .false.
What actually happened?
The model states have been perturbed by perturb_from_single_instance = .true. and any perturbation code in model_mod.

Error Message

None

Which model(s) are you working with?

Any

Version of DART

v10.8.2

Have you modified the DART code?

No

Build information

Any build

Comments,

Filter perturbs the read-in state in create_ensemble_from_single_file, which immediately follows read_state. This precedes the writes of "input" before the time loop and "forecast" in the time loop. This may have been an intentional design choice, or there might not have been better names for those stages, but we should at least note this in the documention and tell people to turn off the perturbations if they want to see the state(s) read into filter. It can be useful to see the state with the perturbations, so maybe the 'forecast' stage should continue to fill that role, but 'input' seems unsuitable for that.

Does the ensemble need to be created before filter_generate_copy_meta_data? Sections after that are about the observations to use and the time window, which don't require a complete or perturbed ensemble. Then compute_copy_mean_sd is called, and stage "input" is written. Is it possible to put the call to create_ensemble_from_single_file after the write of "input"? How about after the write of "forecast"?

nancycollins commented 1 year ago

the perturb option generates an ensemble of states from a single state. i don't know if it makes sense to write out an input or forecast file that is only a single member and then the rest of the files are a full ensemble. i'd have to think about that one for a while. it might be that the right thing to do is point out that when generating an ensemble by using perturbations, that the input and forecast outputs include those.

nancycollins commented 1 year ago

if you run 'perturb_single_instance' first and then run filter, i would think you should see the same files and same values in the 'forecast' output as you'd do if you had filter do the perturb.

kdraeder commented 1 year ago

Ideally, filter should be able to write out the ensemble:

exactly as it's given to filter (currently not available from inside filter)
after perturbations have been added (currently available as stages 'input' and 'forecast')
after prior inflation has been applied
(and then the post-assimilation states).

'input' is a good name for 1. 'forecast' is a misleading name for 2. (I'm looking for a better one) 'preassim' is a good name for 3.

@nancycollins raised an issue (offline) in the case where filter is asked to perturb a single state to make an ensemble with some spread; if the the write of 'input' came before the perturbations were added, then 'input' would be an ensemble of identical members, which seems wasteful to write. I see several reasons to do it this way:

it can confirm that filter is starting from the desired ensemble, either a single state or a hindcast ensemble.
it makes it easier to see the perturbations applied by filter, by differencing the 'input' and the next stage (currently 'forecast')
it looks redundant or wasteful only for the first cycle, and 'input' can be removed from the namelist for the first or later cycles.
It more accurately reflects the name of the stage.

kdraeder commented 1 year ago

if you run 'perturb_single_instance' first and then run filter, i would think you should see the same files and same values in the 'forecast' output as you'd do if you had filter do the perturb.

I'm picturing running perturb_single_instance before running the ensemble hindcast, so that the initial ensemble has spread and the hindcast evolves those differences for N hours. Then 'forecast' would contain a very different ensemble than if perturb_single_instance were run immediately before filter, so that the 'forecast' members differ only by the applied perturbations.

nancycollins commented 1 year ago

hi kevin,

i'm a little confused. you are picturing running perturb_single_instance which generates N files. you have the perturbed data in those files. but if you run a "hindcast" (the model advance?), now you have evolved, advanced-in-time files. you have the initial files that went into your model, and you have the output from your model. you are no longer perturbing and then running filter immediately.

i'm talking about perturbing a single state and immediately running filter, vs running filter and perturbing inside it. those should product the same forecast file.

am i misunderstanding your case?

i think what is happening right now in the code is both the most useful for people and the least surprising. so i'm trying to understand exactly what problem you're trying to solve here. this code also has to do reasonable things when putting all ensemble members, mean, sd, etc into a single DART-defined-format netcdf file. it all works fine right now. i see no need to change things other than making the documentation more explicit here.

n.

kdraeder commented 1 year ago

hi kevin,

i'm a little confused. you are picturing running perturb_single_instance which generates N files. you have the perturbed data in those files. but if you run a "hindcast" (the model advance?), now you have evolved, advanced-in-time files. you have the initial files that went into your model, and you have the output from your model. you are no longer perturbing and then running filter immediately.

KR; correct.

i'm talking about perturbing a single state and immediately running filter, vs running filter and perturbing inside it. those should product the same forecast file.

KR In the case with no model advance I agree that running perturb_single_instance and then filter (perturb_from_single_instance = .false.) yields the same contents of the 'forecast' stage as running only filter with perturb_from_single_instance = .true.

am i misunderstanding your case?

KR In a CAM multi-cycling context I always do a model advance first. So I need an ensemble of ICs for the models. If I can't find one, I make one. It will be convenient to make an ensemble using perturb_single_instance. I have not done that in the past. Then I will have an ensemble with some (small) spread in it, which I will use to start the model advance. The model states will diverge further from each other, so that at hour 6 the ensemble has a larger spread. I don't need to perturb those in order to assimilate, so I'll run filter with perturb_from_single_instance = .false. for that first cycle and for all the cycles that follow. The 'forecast' stage will be unperturbed. It will look exactly like the state written out by the model advance. Again, this is not what I have done in the past.

i think what is happening right now in the code is both the most useful for people and the least surprising. so i'm trying to understand exactly what problem you're trying to solve here.

KR If I use perturb_single_instance before the first model advance, then I will not have a problem. People who do not do that will add the perturbations at the assimilation time and will see 'input' and 'forecast' model states that are the {state_read_in + perturbations}. I was surprised by this, and others may be, if they bother to look. It was not useful for me because I wanted to see exactly what was read into filter, and also easily generate pictures of the perturbations. I also wanted to confirm that the 'forecast' mean matched the truth run from pmo. That's more difficult when the 'forecast' has perturbations in it. My perts are small and random, but other people may use perts that have a less distinctive pattern. The stage names and documentation led me to expect different contents than I found.

this code also has to do reasonable things when putting all ensemble members, mean, sd, etc into a single DART-defined-format netcdf file. it all works fine right now. i see no need to change things other than making the documentation more explicit here.

KR I don't see how it could make any functional difference whether we write to a single file;

a complete 'input' stage, with no perturbations added to the read-in ensemble, ,
or the same with perturbations added on.

It's the same data volume and structure, just different values. Same for a 'forecast', which has perturbations added to the read-in ensemble.

I'd say that it works fine in the sense that no one before me has brought it up. They may have noticed, decided that it didn't matter for their work, and didn't bother to report it. Or they may not have noticed, just used what came out, and never saw anything suspicious in the results they were focused on.

NCAR / DART