Open drakest123 opened 1 month ago
Good discussion of the issue, Steve. One thing to note is that this mainly refers to the standalone use case (ie not run through nextgen). The file opening/reading is skipped by a compiler directive when run in nextgen. Even so I think it's worth making a quick fix on file number ranges to allow for opening many more files -- I'm not sure what a reasonable standlone use case limit is, versus the nextgen application case. If this is an edge case, use wise, the priority to go beyond that fix might be low.
@drakest123 To confirm, we're talking here about forcing and output files when running Snow-17 in standalone mode? I believe you also mentioned an issue caused in NextGen by the parameter file staying open. The former can likely be "fixed" by adding the check you note in option 1 above. The latter should be fixed by closing the parameter file after reading it in.
#define MXUNIT 100
and recompile the Fortran compiler to increase the number of files that can be opened at a time. However, there is also an OS limitation. On this Mac M1 the limitation is 256:
% ulimit -n
256
When the BMI-enabled version of Snow-17 is initialized with many catchments (e.g. more than 44) there is a unit number conflict when opening an input file.
Current behavior
The program crashes.
Expected behavior
The subject of this post.
Open forum
The issue is that files remain open during a Snow-17 run, which limits the number of catchments that can be processed in a given run. Background for this issue is from an email from Andy Wood:
"My original code setup for those models (pre-BMI) looped through each zone (eg U, L) and opened/closed the files before moving to the next zone, since as you say, they don't interact. When I refactored it to use BMI, I changed it to open all the files in the initialize step and close them all in the finalize step, and I didn't think about the upper limits. I/we could easily change the numbering scheme for the files to enable it to keep many 1000s of files open, which most machines would support, and which would enable a reasonable large (but not infinite) standalone run case. And when run in nextgen, snow17 should be getting forcings from the framework and not opening its files. The current numbering scheme envisions running a basin at a time, and the basin might have some number of elevation zones but probably never more than 20 (in RFC world the max is about 3).
An alternative might be tricky – given the way the update() function works. It would be inefficient to have that function re-open all the files and close them just to read just the forcing for a single timestep. If all the forcings are in one netcdf file, instead of individual csvs, then that could simplify the problem. Is the reason that noah-om doesn't have this issue because it doesn't try to run sub-catchment level zones? (or basin sub-catchments)? I think all the noah-om dev. work ran standalone over single catchments (ie one forcing file per catchment)."
Possible alternatives:
Alternative discussion:
I don’t know the answer to your question about whether noah-om was tested using single catchments but that does seem to be a critical point. If it is advisable to run Snow-17 a single catchment at a time in the noah-om context then the file I/O issues become less consequential.