NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
189 stars 142 forks source link

note on inflation, allow_missining_in_clm, and groups #277

Open hkershaw-brown opened 3 years ago

hkershaw-brown commented 3 years ago

This is note about missing values. allow_missing_in_clm is in the inflation handle.

filter:

1649 do j = 1, ens_handle%my_num_vars
1650      call inflate_ens(inflate, ens_handle%copies(grp_bot:grp_top, j), &
1651      ens_handle%copies(ENS_MEAN_COPY, j), ens_handle%copies(inflate_copy, j), 0.0_r8, &
1652       ens_handle%copies(SPARE_PRIOR_SPREAD, j), ens_handle%copies(ENS_SD_COPY, j)) 
1653 end do 

inflate_ens:

532 ! it's possible to have MISSING_R8s in the state vector now.  
533 ! so we need to be able to avoid changing MISSING_R8 values by inflation here.
534 if (inflate_handle%allow_missing_in_clm) then
535   if (any(ens == MISSING_R8)) return
endif

So if some groups have missing values, but some don't, you will apply inflation to some groups but not others.

nancycollins commented 3 years ago

i think the intent was this would skip computing inflation on an item-by-item basis if any ensemble members had missing values for that item. if this is not skipping the entire group then i think something's not right.

edit: if any members have missing_r8 and group size is > 1, then i'd think all groups for this item should be skipped? ??? now i'm more confused.

nancycollins commented 3 years ago

i looked at this code again (and i updated my last comment). i'd forgotten how groups work. this code is trying to compute inflation on a per-item basis, but if your group size is > 1 then it computes the values on subsets of the ensemble members, right?

i have to think in concrete numbers sometimes. if your ensemble size is 30 and your group size is 2, then it computes inflation for ensemble members 1-15, and then 16-30 as if you had two separate ensembles of 15 member each. is that right?

if so, then how does adaptive inflation work? because you only have a single inflation value per item written out at the end of the assimilation.

hkershaw-brown commented 3 years ago

Yeah that is the way I was thinking about allow_missing_in_clm. If one of ensemble member 1-15 has a missing_r8, but 16-30 don't. I think you would inflate 16-30 but not 1-15. Who cares? I don't know.

I think it is worth making a note of the places where it is not clear how groups interact with other options, mostly to make a plan for refactoring.

For example, this loop https://github.com/NCAR/DART/blob/75d677c23f5602e0cade10ae86cb021c6ff70ef2/assimilation_code/modules/assimilation/filter_mod.f90#L1635

I think the reset of RTPS on line 1646 (edit) means you only get the inflation from the last group in the output file.

nancycollins commented 3 years ago

in looking at the inflation code i believe for adaptive inflation kinds 2 and 5 that the update_inflation() routine is called for each group, so the adaptive inflation values are updated N times if you have N groups, which may be the intended algorithm. i haven't figured out yet if the RTPS implementation took into account num_groups > 1.

hkershaw-brown commented 3 years ago

note on missing_8 footprint in quad_utils_mod #249

hkershaw-brown commented 2 years ago

models calling set_missing_ok_status as of Jan 6th 2022:

../../../models/noah/model_mod.f90:252:call set_missing_ok_status(.true.) ../../../models/clm/model_mod.f90:484:call set_missing_ok_status(.true.)

call set_missing_ok_status is called in static_init_model and then overwritten by the call in filter.

filter_mod.f90:
367 call filter_initialize_modules_used() ! static_init_model called in here

369 ! Read the namelist entry
     call find_namelist_in_file("input.nml", "filter_nml", iunit)
     read(iunit, nml = filter_nml, iostat = io)
     call check_namelist_read(iunit, io, "filter_nml")

...

399 call set_missing_ok_status(allow_missing_clm)  ! this call to set_missing_ok_status overwrites whatever was set in static_init_model
400 allow_missing = get_missing_ok_status()

This is not so good.

nancycollins commented 2 years ago

eons ago i think we decided that instead of filter setting the missing status, model_mod should have the control. that accomplishes two good things. 1) any executable that uses the model_mod will have a consistent missing status set, and 2) since supporting missing data values seems to be model-dependent it's the more logical place to have it.

but i don't think the main branch ever moved where missing was being controlled.

hkershaw-brown commented 2 years ago

eons ago i think we decided that instead of filter setting the missing status, model_mod should have the control. that accomplishes two good things. 1) any executable that uses the model_mod will have a consistent missing status set, and 2) since supporting missing data values seems to be model-dependent it's the more logical place to have it.

I'm not sure I buy this.

nancycollins commented 2 years ago

does PMO use the same init code as filter? also dart_to_model and model_to_dart - all of those main modules will have to have the same flag or they won't treat the missing values consistently. it's not only filter, is it?

hkershaw-brown commented 2 years ago

maybe maybe, let me think about this and pick your brains.

nancycollins commented 2 years ago

this is an attempt at a brain dump on "invalid data in the state" issues. i'm not sure where to put this comment but i'll add it here and you can cut and paste it other places if that's better.

Missing values missing values in the state have always been supported indirectly, even before CLM. however before CLM there were other ways to determine if the data was invalid other than by putting MISSING_R8 in the state. this is because it was hard to prevent either the assimilation and the inflation from changing the data values. here's how they were handled in the past:

model_mod::interpolate() there are several model_mods which recognized invalid data, like POP with land values or data below sea floor surface. these models had other ways to determine if a data point was valid other than the actual value in the state. for example, POP has a land/sea mask, and also an array that tells how many vertical cells are water before hitting the sea floor. for interpolate if any of the needed values were invalid the interpolate call would return an error code. inside the module code it used MISSING_R8 as a temporary marker to indicate a data value was missing, but didn't assume the state contained MISSING_R8.

quad utilities code the quad utilities module has interpolation code that helps simplify model_mod interpolation for fully rectangular or logically rectangular grids. it computes the indices of the needed corners, and then the model_mod supplies those values. if one or more of the corners are invalid (as determined by the model_mod) it passes MISSING_R8 as the value and the interpolation code recognizes that as an invalid corner and the interp fails. but again, this is a transient data value.

model_mod::get_close() the models which had invalid data in their arrays could exclude them from the get_close() lists so they wouldn't be updated by the assimilation code. they would still be updated by inflation, however, so again, before CLM the model code did not depend on the data value in the state to determine valid/invalid. this was mostly an efficiency issue to avoid doing unnecessary work.

model_to_dart and dart_to_model before the netcdf i/o, when all models needed translation programs, the dart_to_model code would be the logical place to handle writing data values back to the model files. if the model didn't care about data values in invalid locations it could simply write the state data back without translation. if the model required specific data values for invalid locations, the translation program would consult the aux array that indicated what locations were invalid and put the needed values there.

invalid data in state without an aux array if there is no indication of invalid data other than the actual data value being MISSING_R8, then the inflation code and the assimilation code had to be modified to not alter this value. that's when code was needed in the main filter routines to avoid changing MISSING_R8 in the state.

if MISSING_R8 is not allowed to be in the state the assimilation and inflation tests can be removed, but there still needs to be a way to determine if a location contains invalid data, both for the forward operator interpolation code and for writing the updated state back to the model data files.

one solution is to add/compute an aux array in the model_mod code to flag invalid state locations. the other is to construct data arrays that do not have invalid points (e.g. constructing sums of valid values) and recreating the needed model data with a separate dart_to_model program. but it seems that the goal should be to avoid depending on special values in the state data.

hkershaw-brown commented 2 years ago

Some works of art to illustrate junk state vs. incomplete ensemble

Junk state, same across each ensemble member: Consequence of NetCDF being an X,Y,Z box.


CLM missing values are at different locations in each ensemble member: Different number of levels in each ensemble member.

nancycollins commented 2 years ago

great point about differentiating between all ensemble members having invalid data at a location vs some members having valid data and others not. here are two other thoughts about missing values just for completeness.

bookkeeping issues: to avoid special values in the state the bookkeeping could become prohibitively large or slow.

if all members have the same invalid locations, an aux array of model grid size is sufficient. however if each ensemble member could have different invalid locations, an array (ensemble size x model grid size) or worse (ensemble size x model grid size x number of variables which can have missing values) would be required to track them. or a list of (location, member number, variable) would have to be kept and searched. the former could double the size of the state, the latter could be slow to search.

forward operators where some members fail, some succeed: it has always been possible for the interpolate routine to succeed for some members and fail for others. for example, interpolating the forward operator value for an observation with a pressure vertical location. if pressure is in the model state, each member computes the same horizontal location but different vertical locations relative to the model grid. it's possible for some members to be able to compute a forward operator value if the observation is completely inside the model grid, where other members may compute the location to be above or below the grid and return errors. so the interpolation code has always checked to see how many members returned good interp values vs errors which are marked with both a positive RC from the call and MISSING_R8 as the interp value.