Open drakest123 opened 4 months ago
Good discussion of the issue, Steve. One thing to note is that this mainly refers to the standalone use case (ie not run through nextgen). The file opening/reading is skipped by a compiler directive when run in nextgen. Even so I think it's worth making a quick fix on file number ranges to allow for opening many more files -- I'm not sure what a reasonable standlone use case limit is, versus the nextgen application case. If this is an edge case, use wise, the priority to go beyond that fix might be low.
@drakest123 To confirm, we're talking here about forcing and output files when running Snow-17 in standalone mode? I believe you also mentioned an issue caused in NextGen by the parameter file staying open. The former can likely be "fixed" by adding the check you note in option 1 above. The latter should be fixed by closing the parameter file after reading it in.
#define MXUNIT 100
and recompile the Fortran compiler to increase the number of files that can be opened at a time. However, there is also an OS limitation. On this Mac M1 the limitation is 256:
% ulimit -n
256
To add some clarifications here:
The Fortran specs indicate the one can open as few as 100 files at a time (file units 0-99) and some of these file units are dedicated to standard I/O (e.g. unit numbers 4,5,6). The number of files one can open at a time is implementation specific.
This was true of fortran 77. In modern compilers, the UNIT
is an INTEGER
type and can represent a significantly larger number of open files (though the "reserved" units are still generally respected from the defaults.)
ulimit -n
This only shows you what the currently configured limit is, you can actually change this per shell/system
ulimit -n 1024
ulimit -n
should now have 1024 set for that shell.
Given the recent correction of statements in the ioModules that were being skipped when run in nextGen (and had to be relocated to the main code), I'm not thinking this all the reverse problem -- that some statements related to file number assignment that should be skipped when run in nextgen are not -- and will need to be similarly relocated (or a #ifndef exception added around them). All the units assigned should be much higher than the reserved ones, incidentally, if that still is relevant.
I can check this tomorrow if it's not already solved.
On Mon, Sep 9, 2024 at 3:45 PM Nels @.***> wrote:
To add some clarifications here:
The Fortran specs indicate the one can open as few as 100 files at a time (file units 0-99) and some of these file units are dedicated to standard I/O (e.g. unit numbers 4,5,6). The number of files one can open at a time is implementation specific.
This was true of fortran 77. In modern compilers, the UNIT is an INTEGER type and can represent a significantly larger number of open files (though the "reserved" units are still generally respected from the defaults.
ulimit -n
This only shows you what the currently configured limit is, you can actually change this per shell/system
ulimit -n 1024ulimit -n
should now have 1024 set for that shell.
— Reply to this email directly, view it on GitHub https://github.com/NOAA-OWP/snow17/issues/42#issuecomment-2339209441, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIKARNL6IVCL4T5EK72VFDZVYJGZAVCNFSM6AAAAABLBCVQLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZZGIYDSNBUGE . You are receiving this because you commented.Message ID: @.***>
@drakest123 and @andywood Can you two please confirm if this issue is for Snow-17 running in NextGen or in standalone mode (i.e., it handles its own forcing and output writing)? If it is the latter, I will close this issue as we are not concerned about someone using this version of Snow-17 outside of the NextGen framework to run multiple catchments.
When the BMI-enabled version of Snow-17 is initialized with many catchments (e.g. more than 44) there is a unit number conflict when opening an input file.
Current behavior
The program crashes.
Expected behavior
The subject of this post.
Open forum
The issue is that files remain open during a Snow-17 run, which limits the number of catchments that can be processed in a given run. Background for this issue is from an email from Andy Wood:
"My original code setup for those models (pre-BMI) looped through each zone (eg U, L) and opened/closed the files before moving to the next zone, since as you say, they don't interact. When I refactored it to use BMI, I changed it to open all the files in the initialize step and close them all in the finalize step, and I didn't think about the upper limits. I/we could easily change the numbering scheme for the files to enable it to keep many 1000s of files open, which most machines would support, and which would enable a reasonable large (but not infinite) standalone run case. And when run in nextgen, snow17 should be getting forcings from the framework and not opening its files. The current numbering scheme envisions running a basin at a time, and the basin might have some number of elevation zones but probably never more than 20 (in RFC world the max is about 3).
An alternative might be tricky – given the way the update() function works. It would be inefficient to have that function re-open all the files and close them just to read just the forcing for a single timestep. If all the forcings are in one netcdf file, instead of individual csvs, then that could simplify the problem. Is the reason that noah-om doesn't have this issue because it doesn't try to run sub-catchment level zones? (or basin sub-catchments)? I think all the noah-om dev. work ran standalone over single catchments (ie one forcing file per catchment)."
Possible alternatives:
Alternative discussion:
I don’t know the answer to your question about whether noah-om was tested using single catchments but that does seem to be a critical point. If it is advisable to run Snow-17 a single catchment at a time in the noah-om context then the file I/O issues become less consequential.