NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
196 stars 145 forks source link

State IO #359

Open hkershaw-brown opened 2 years ago

hkershaw-brown commented 2 years ago

Notes for state IO

hkershaw-brown commented 2 years ago

dart_time_io_mod.f90 uses its own local has_unlimited rather than the state_structure_mod domain%has_unlimited

https://github.com/NCAR/DART/blob/cbe0d41e9ce190366b3cc5cc1433becf2ebc421c/assimilation_code/modules/io/dart_time_io_mod.f90#L218-L222

hkershaw-brown commented 2 years ago

In create_and_open_state_output, only 'time' (lower case) can be the unlimited dimension.

https://github.com/NCAR/DART/blob/cbe0d41e9ce190366b3cc5cc1433becf2ebc421c/assimilation_code/modules/io/direct_netcdf_mod.f90#L1643-L1656

If you have an unlimited dimension in the state_structure_mod, e.g. WRF has 'Time' this gets created as a limited dimension.

This means that your created netcdf files have a different netcdf dimension structure than your model files.

has_unlimited is a property of the state_stucture%domain, but it is (can be) different between netcdf files because of create_and_open_state_output. I think this is why dart_time_io_mod is querying the netcdf file rather than the state_structure (see comment above).

There is also domain%variable(ivar)%var_has_unlim which is per variable, set but never used.

hkershaw-brown commented 2 years ago

This is a hard coded 0 however, there is a integer, parameter :: SINGLE_IO_TASK_ID = 0 in the module.

https://github.com/NCAR/DART/blob/cc1da67976feb3b092ac9eae7388f6181ef33154/assimilation_code/modules/io/direct_netcdf_mod.f90#L431-L434

hkershaw-brown commented 2 years ago

perfect_model_obs has namelist options:

single_file_in
single_file_out

but the number of copies is hardcoded at 1 when initializing the filenames https://github.com/NCAR/DART/blob/ba00b2c5c89ead8e27cd6436276c90eb00ddd48b/assimilation_code/programs/perfect_model_obs/perfect_model_obs.f90#L287

and the ens_size is fixed at 1: https://github.com/NCAR/DART/blob/ba00b2c5c89ead8e27cd6436276c90eb00ddd48b/assimilation_code/programs/perfect_model_obs/perfect_model_obs.f90#L166

what is the single_file_in/out for in perfect_model_obs?

nancycollins commented 2 years ago

we need to rename the file format that puts all the ensemble members plus inflation copies, mean, sd, etc in a single netcdf file (that dart dictates the format of).

we called it "single file" but as you point out that's confusing for the situation where there is only 1 member involved. the code is going to expect to read a netcdf file with specific dimension names and variable names.

other suggestions for this format? combination file, combined file, dart format file, ???

hkershaw-brown commented 2 years ago

I think my question is even more simple than that, do we need single_file_in as a namelist option for perfect_model_obs? I think it only every runs with 1 copy, but am I missing something?

nancycollins commented 2 years ago

yes, because even though there is always only 1 input and output file, this item toggles on and off the dart netcdf format file vs a model netcdf file.

the namelist variable name is confusing because we called dart format files "single file" no matter how many members are in it.

hkershaw-brown commented 2 years ago

yeah I get it, just thinking about refactoring. Maybe it would be better to have the file describe itself as 'dart format'

nancycollins commented 2 years ago

if the dart file had a global attribute to indicate it was "dart format" that would be good -- make it more self-describing. then maybe a namelist item wouldn't be needed. that would be nice.

but we'd have to think about how the code could use it. it might need to open the file, look for the attribute and then decide whether to use the state structure setup info from the model's static_init_model or the dart defined format to read the file. i'm not clear if there is an order of things that works however.

it would be nice if the i/o code could use the state structure for reading a dart format file but with all the members in a single netcdf variable i don't know if it can.

hkershaw-brown commented 9 months ago

Diagnostic structure, not used: https://github.com/NCAR/DART/blob/4dadee739df650ab5eca62fa7c29add3d4c52bed/assimilation_code/modules/io/state_structure_mod.f90#L131-L135

hkershaw-brown commented 9 months ago

assert_restart_names_initialized is printing the error message from assert_file_info_initialized

https://github.com/NCAR/DART/blob/4dadee739df650ab5eca62fa7c29add3d4c52bed/assimilation_code/modules/io/io_filenames_mod.f90#L201-L233

hkershaw-brown commented 8 months ago

add_domain_blank is not blank (just a state vector) it has 3 dimensions: location, member, time. The are appropriate for lorenz_X style models, but not necessarily appropriate for anything else.

https://github.com/NCAR/DART/blob/77bb8c2d64ffb5a86ded071520476b6dc4dccc4c/assimilation_code/modules/io/state_structure_mod.f90#L412-L448

state%domain(dom_id)%unique_dim_names(1)  = 'location'
state%domain(dom_id)%unique_dim_names(2)  = 'member'
state%domain(dom_id)%unique_dim_names(3)  = 'time'

The 'member' dimension for the single file IO - again blending state_structure & IO (which are not the same, but currently are in the same state_structure).

hkershaw-brown commented 2 months ago
hkershaw-brown commented 2 months ago

missing_in_state is for some ensemble members missing, dry land (all ens missing missing, e.g. POP) skates though.

Screenshot 2024-09-05 at 4 52 28 PM

note check for missing in state in assim_tools is mucho expensive (30% of runtime)