Closed JanStreffing closed 2 months ago
I would find this a very important and useful functionality, also for the IFS-FESOM workflow with RAPS
I discussed this with some colleagues over lunch: Not only does this behavior force us to copy the restarts instead of being able to link them. It also makes the restart files larger by a factor timesteps_in_the_file. E.g. A single timestep DART restart folder is 31GB. One that has 12 timesteps is thus 372GB, costing us extra space and time.
I think we do the time step in the netcdf restart file and the checkup for the this time-step in the model so that the restart can be arbitrary not just based on a full year, month or day. Especially for debugging it is pretty convenient if you can make a restart based on a specific model time step e.g. before a blowup occurs! If you want to cover all this possibilities you will need a pretty long folder name. In this case you would need to go for the full time stamp description YYYY-MM-DD-HH-MM-SS. Or you use the timestamp number (seconds within the year) directly as a folder name something like YYYY-timestampnumber (e.g. YYYY-31535400).
IMO a longer folder name is an okay price for the ability to link it from pool_dir safely.
This is the third issue for the same problem. See also: https://github.com/FESOM/fesom2/issues/279 and #617. Closing here, lets use the oldest issue.
The issue: Currently when FESOM2 finds preexisting restart files in a restart folder for the given year, e.g.
fesom.1849.oce.restart/ssh.nc
and we ask it to write monthly restarts during year 1849 (e.g. during first year of spinup from PHC3), it will keep adding restart timesteps to the netcdf files therein:Upon restarting from such a folder/files FESOM2 will use the latest step found and double check that it matches with the fesom.clock. If the timesteps don't match, the model exits.
I find this behavior unsafe. Especially when one accidentally links instead of copying a restart from a pool_dir to a work folder, fesom will try to modify to original restart files in the pool_dir and start adding timesteps there. In the worst case scenario, the user will have write permissions on the pool dir, and the restart file will actually be modified. This recently happened to @mzapponi.
Proposed solution: I think a better solution, would be to have a more detailed timestamp on the restart folder name. e.g.
YYYY-MM-DD-HH-MM-SS
, or at leastYYYY-MM-DD
. Instead of checking if the folder/file exists, and if it does adding a timestep, we can check if it exists, and if so, exit the model. This way we never accidentally modify an existing restart file.Unless I hear a strong no, to the suggestion, I would create a draft for such a change soon.
@patrickscholz @hegish @dsidoren