Single restart file output for multi-catchment simulations

joshsturtevant commented 1 month ago

During multi-catchment simulations, all state variables for each catchment are saved to a single restart file.

Current behavior

When outputting state files (i.e., write_states = 1 in the namelist file) in a multi-catchment Snow-17 model run within NextGen, all states for each timestep are output into a single restart file of size [n_basins x n_timesteps, state_variables]. These outputs all end up in the statefile of the last catchment.

It also seems there may be a typo in the restart file header, which I believe should have a space between cs8 and cs9. It is not immediately obvious to me how the cs* state variable naming convention maps to the NWS conventions described on p. 42 of this doc.

Expected behavior

Each catchment's state variables should be output to individual text files corresponding to one catchment each. Initialization of these files behaves as expected, suggesting that the issue is likely with how the write_snow17_statefile submodule is called. Additionally, it appears that the expected behavior is to only output the states for the last time step; the current behavior saves out all time steps of the simulation period.

Steps to replicate behavior (include URLs)

Run multi-catchment ngen snow17 simulation with writes_states =1

Screenshots

andywood commented 1 month ago

Quick comment, the code skips the header when reading the state files ... the typo should be corrected though.

I wonder if a framework developer can comment on how the framework is managing snow17's i/o routines? I don't know myself.

snow17 enables sub-catchment hrus, and the i/o expects each hru to have its own forcing, output, and state file. So a catchment could have multiple state, forcing and output files. Also, the state files contain all of the timesteps from a state-file generation run, which is different from the way some models use state files (ie each state file has a single date). This snow17 approach is similar to the way the NWSRFS system (pre-CHPS) model state files were set up, if I remember correctly.

more generally, it may be worth refactoring the snow17 i/o codes (states, forcings, output) so that the standalone run mode is completely aligned with the way ngen manages the i/o. For instance, we could go to single-timestep statefiles which would facilitate all HRUs being written into a single one for each catchment. this will also be true of sac if the hru capability was preserved. or ideally use netcdf that can easily extend dimensionality if we want to have multi-catchment input/state/output files.

andywood commented 1 month ago

This could have to do with how the runInfoType routine that sets up the file units starts with the same number for every catchment, but allows for a range across the hrus. But if the run is over multiple catchments (each with just 1 hru), it will keep using the same fileunit to write into. Since those files aren't just opened and closed once to write a single time, they stay open until the simulation is done. This code (pre-BMI) had the time/space loops swapped, so it run all times in one catchment before going to the next, so this would not have been an issue. This file-unit specification logic (and/or the keep-open statefile approach) will need updating, I think. I would do both.

! assign input forcing and output fileunit numbers
do nh = 1, n_hrus
  this%forcing_fileunits(nh)    = 250 + nh     
  this%output_fileunits(nh)     = 500 + nh       ! allows for ~250 non-overlapping fileunits
  this%state_fileunits(nh)        = 750 + nh       ! allows for 250 non-overlapping fileunits
end do
this%output_fileunits(n_hrus+1) = 500 + n_hrus + 1   ! add one more unit for combined outputs

SnowHydrology commented 1 month ago

Hi @joshsturtevant and @andywood, I believe I am in need of some background info to better understand the issue.

When outputting state files (i.e., write_states = 1 in the namelist file) in a multi-catchment Snow-17 model run within NextGen

When you say "multi-catchment" are you referring to running multiple NextGen catchments or multiple Snow-17 HRUs within a single NextGen catchment?

Each catchment's state variables should be output to individual text files corresponding to one catchment each.

If catchment = NextGen catchment, then I'm curious to how this would work without a corresponding framework utility. To explain, we run models in NextGen using a different paradigm than most other modeling frameworks. The models are not functions/modules/routines being called by a driver program; rather, they are shared libraries. This is how we can use one TOPMODEL library to run tens, hundreds, thousands, etc. instances of TOPMODEL. Importantly, these instances are ignorant of their spatial position in the hydrofabric. They do not know their catchment IDs. That is why we deactivate a model's native I/O routines and rely on NextGen's. In this paradigm, we would not expect a model's native restart-file-writing routine to work correctly. There is likely only one file regardless of the number of NextGen catchments because only one file is ever created and written to by the shared library.

joshsturtevant commented 1 month ago

Correct, here I am referring to model runs with >1 NextGen catchments (and Snow-17 HRUs = 1, in this instance).

If a single restart file is indeed the expected and designed behavior, then I would think the restart file would need either:

multi-indexing (e.g., indexed by time and catchment_id)
a new dimension [time x catchment_id x state_variable], which would require use of netCDF, or
to break with the snow-17 convention and only output the last timestep, in which case the dims would be [catchment_id x state_variable].

Also, the state files contain all of the timesteps from a state-file generation run, which is different from the way some models use state files (ie each state file has a single date). This snow17 approach is similar to the way the NWSRFS system (pre-CHPS) model state files were set up

andywood commented 1 month ago

re (3), it's useful to give the state file generation some options (ie last timestep, regular timesteps -- eg 1st of month, last of month, specific timestep) -- but the resulting file or files that are dropped just have a single timestep. If you generate a monthly or daily series, you get multiple state files rather than one combined one. the way nextgen runs now (and the way FEWS/CHPS runs) seems to favor a single state file per catchment and per component (eg sac, snow17). those could still just be text files.

SnowHydrology commented 1 month ago

@joshsturtevant, thanks for clarifying. As of now, there is no way to pass the catchment ID or other information of that nature to the models and modules running in NextGen. I.e., I believe it will be difficult to force the square peg of "Snow-17 writes its own restart files" through the round hole of NextGen.

I suggest you open an issue, if you haven't done so already, on the ngen repo to see the possibility of writing restart files per catchment in a way that's consistent with other IO functionalities. Caveat: we've been down this road a bit and it's much easier for some models (e.g., Snow-17) than others.

andywood commented 1 month ago

@keith, good points. there's definitely no interest in have snow17 do its own i/o (forcing, state, output, params) when run in nextgen. using snow17 state file read/write right now may be just a temporary phase (that's causing confusion) pending nextgen rolling out state file capabilities, as we try to run forecasts for ciroh research. in the past, I've suggested nextgen have a forcing engine, an output engine and statefile engine -- the statefile engine could just borrow/adapt code from forcing & output engines (I know the latter doesn't exist as such) so it might not be a heavy lift.

andywood commented 1 month ago

I wonder if we should close this issue? I think there's a pressing action item related to nextgen (with issues like https://github.com/NOAA-OWP/ngen/issues/530). a snow17 state file (standalone) action would be conditional on that (and low priority). I'd say once the nextgen state file capability for models including snow17 is implemented, there could be a future issue to align standalone-mode snow17 state file handing with that functionality (to facilitate offline experimentation). in the meantime we can craft something in a branch that works for our purposes.

joshsturtevant commented 1 month ago

I'm in favor of closing this for now @andywood, given what @SnowHydrology has shared about how this very much relates to larger ngen structural limitations, at least at the time being.

thanks all for the discussion!

NOAA-OWP / snow17