make async2 work again - Githubissues

nancycollins commented 2 years ago

Use case

When the I/O code was re-worked to add the 'all ensemble member data written to a single netcdf file' option, we lost the ability to loop over multiple assimilation windows inside filter unless the model was subroutine callable (async = 0). so async=2 and async=4 were broken at that point. Most larger models which might have used async=4 now run with a separate run script. The script calls filter for a single assimilation window and exits, and then the script advances the ensemble of model states however it needs to, and then it repeats. This seems fine, so async=4 may not be needed anymore. But medium-sized models which used to use async=2 (too big to be subroutine callable, but run very quickly compared to starting and stopping filter) could use the option to loop inside filter. This option writes a list of ensemble member filenames to a control file, and then uses the system() call to start a script to advance the model states. when the script is done, filter continues.

Describe your preferred solution

in assimilation_code/modules/assimilation/obs_model_mod.f90 is the 'advance_state()' subroutine. the current version in the git main branch does call the netcdf-file-format write and read routines. but they need to have different options than the initial filter input files and final filter output files. e.g. they need to always be a separate file per member, and they need to have different filenames. right now, the write routine uses the input.nml options, so it might put all member data into a single netcdf variable name and it writes something like filter_output.nc or whatever is in the input.nml namelist. it also needs to write all members even if the namelist says not to write the output files.

hkershaw-brown commented 2 years ago

note from Nancy: the crux here is the output filename and structure. Check state_structure and io.

hkershaw-brown commented 2 years ago

hot take: I don't think starting and stopping filter takes that long. And it is better to use Multiple Program Multiple Data batch scripts than async2 (or 4).

nancycollins commented 1 year ago

i agree with helen that filter starting and stopping time isn't long. i think the issue is more that some simpler models take almost zero time to run to advance the model, and may use only a single "combined" file for their data. the best solution for them may be to make the model subroutine callable and compile it with dart and use the async=0 mode (model advances are a subroutine call to code in the model_mod). but that may be cumbersome for some models. i'm exchanging notes with toni and maybe this will be a good use case to see how best to proceed here.

NCAR / DART

make async2 work again #267

Use case

Describe your preferred solution