Feature request: model_mod_check needs more MPI awareness

nancycollins commented 2 years ago

Use case

moha is trying to debug an MITgcm_ocean model with a state vector larger than an i4. he ran into various problems when trying to use model_mod_check to debug things. the sizes related to i8 are being taken care of in another pull request. model_mod_check is MPI capable, but some of the tests are running on all tasks when that doesn't make sense. one test is allocating a model_size array on a single task which doesn't fit for really large state vectors. some of the tests simply need to restrict the actual test part on a single task, others need redesign and rethinking.

Is your feature request related to a problem?

yes. bad things happen when state vector size is > i4

Describe your preferred solution

look at each test and decide how best to behave when running with a large number of tasks.

Describe any alternatives you have considered

use smaller model states! :)

nancycollins commented 2 years ago

i've made a 'model_mod_check' branch to try to fix some of these issues.

mgharamti commented 2 years ago

i've made a 'model_mod_check' branch to try to fix some of these issues.

Thank you Nancy. I appreciate your help.

nancycollins commented 2 years ago

the model_mod_check branch status is: it should work with large state vectors and mpi. but i don't think it's "finished". i haven't documented the new tests. also we should probably have a discussion about how to deal with adding tests and changing the meaning of the test numbers. (they could be strings or something.) there are complications with dependencies of the tests - like you have to read in a state vector for most of the other tests to work, etc.

here are the 18 required public model_mod entry points for a model_mod:

public :: get_model_size, &
          get_state_meta_data,  &
          model_interpolate, &
          shortest_time_between_assimilations, &
          static_init_model, &
          init_conditions,    &
          adv_1step, &
          nc_write_model_atts, &
          pert_model_copies, &
          nc_write_model_vars, &
          init_time, &
          get_close_obs, &
          get_close_state, &
          end_model, &
          convert_vertical_obs, &
          convert_vertical_state, &
          read_model_time, &
          write_model_time

several of these have default routines so there aren't any models that have every one of the 18. still we should have tests for them. right now the version on the model_mod_check branch only tests:

public :: get_model_size, &
          get_state_meta_data,  &
          model_interpolate, &
          static_init_model, &
          convert_vertical_state

so there are lots of additional tests that should be added to help people either develop or debug their model_mod code.

nancycollins commented 2 years ago

the original tutorial materials had a progression of working through model_mod routines that was nice. it started with identity obs (so interpolation wasn't needed) and worked up to more functionality. simple models can supply init_time() and init_conditions() which lets you start without a restart file, etc. there is a logic to how to order these and i'd like to write a simple design document first before coding this up. (in a week after i get back from vacation.) if anyone has suggestions for mmc please add them here?

hkershaw-brown commented 2 years ago

Adding this note here, since you could hit this when adding a new model: default/model_mod.f90 has the wrong arguments for get_state_meta_data. Should not have state_handle.

https://github.com/NCAR/DART/blob/f3f100e5f17797e1a6a44b8b11d947c6a64cd902/models/utilities/default_model_mod.f90#L215

nancycollins commented 2 years ago

historical note: the original wrf model_mod.f90 :: get_state_meta_data() did the conversion to the requested vertical localization coordinate. so the first RMA implementation had to include an ensemble_handle() to allow the code to access other parts of the state vector to do that. but that made it difficult to use get_state_meta_data() in any other context if an ensemble handle wasn't available, and there may have been performance concerns. so the vert convert code was removed from the wrf get_state_meta_data() and the ensemble_handle argument was dropped. 2 vertical conversion entry points were added to the model_mod required list, so filter or any other main program could do the conversion at the time of their choosing.

hkershaw-brown commented 2 years ago

https://github.com/NCAR/DART/tree/model_mod_check

NCAR / DART