elbeejay commented 3 years ago

PR to modify overwriting and checkpoint saving behavior

PR does 2 1 things.

Adds clobber_netcdf as a yaml parameter with default behavior set to False. This parameter controls whether or not an existing pyDeltaRCM output file will be overwritten if a new model is initialized with the same out_dir. Currently we overwrite the old file and spit out a warning. Proposed default behavior raises an error before overwriting an existing file, but existing behavior can still be had (maybe useful for prototyping or testing when the same model is being initialized a bunch) by setting clobber_netcdf to True. Closes #184.

Edited to just make clobbering an optional parameter. Didn't want to over-complicate saving logic.

2. Current checkpointing behavior requires save_checkpoint to be explicitly set to True in order to save any checkpointing information. My opinion is that if one bothers specifying checkpoint_dt as a number in the yaml, then save_checkpoint should automatically flip to True if it wasn't already. That is just my opinion, so feel free to disagree... this would close 185. Changed this to cover really the two scenarios in "conflict".

The first is when save_checkpoint: False and checkpoint_dt = #. This scenario may arise when the user actually intends for a checkpoint to be saved with checkpoint_dt but did not actually specify save_checkpoint and so it took on its default value of False. Proposed change does not change the value of parameters but provides a warning to the user to let them know that a checkpoint_dt value has been specified but checkpointing is turned off.

The second is when save_checkpoint: True and checkpoint_dt is not specified. In that case, the user wishes to save a checkpoint but has not specified the frequency at which they want to save it. When this happens, we revert to the previous behavior and set checkpoint_dt = save_dt, but now a warning is supplied to the user telling them that this is happening. Before we did this without notifying the user or recording it...

codecov[bot] commented 3 years ago

Codecov Report

Merging #186 (e1aa8b7) into develop (620c7a8) will increase coverage by 0.29%. The diff coverage is 95.23%.

:exclamation: Current head e1aa8b7 differs from pull request most recent head e8ce3d1. Consider uploading reports for the commit e8ce3d1 to get more accurate results

@@             Coverage Diff             @@
##           develop     #186      +/-   ##
===========================================
+ Coverage    78.06%   78.36%   +0.29%     
===========================================
  Files           12       12              
  Lines         2462     2477      +15     
===========================================
+ Hits          1922     1941      +19     
+ Misses         540      536       -4

Impacted Files	Coverage Δ
pyDeltaRCM/model.py	`89.98% <93.33%> (+0.20%)`	:arrow_up:
pyDeltaRCM/init_tools.py	`99.12% <100.00%> (+1.17%)`	:arrow_up:
pyDeltaRCM/iteration_tools.py	`95.48% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 620c7a8...e8ce3d1. Read the comment docs.

amoodie commented 3 years ago

I think 1) makes sense. No comments there :+1:

I think 2) make sense, mostly.

My only thought is that we have a few of these "one setting overrides another" situations now, and I think that's fine, but maybe we can would feel better about the choices if we also throw a warning? At least this way, the behavior is documented in both the documentation and in the runtime.

Also, importantly, what happens in the case where I explicitly set both checkpoint_dt: 100, save_checkpoint: False? Maybe I ran the model to a point where I am happy with the checkpointed state, and I want to experiment with a few different conditions following that state, but not overwrite the checkpoint file (e.g., a spin-up run). Do I need to also remove the checkpoint_dt? This is an actual scenario I am working now, where I use the same "base yaml" to run a spin-up and a matrix of experiments following that resumed state. I think we need to be sure to honor the save_checkpoint: False if it is stated explicitly.

elbeejay commented 3 years ago

My only thought is that we have a few of these "one setting overrides another" situations now, and I think that's fine, but maybe we can would feel better about the choices if we also throw a warning? At least this way, the behavior is documented in both the documentation and in the runtime.

Sure I can agree with throwing a warning. What other situations are you thinking of where one parameter supercedes another? The closest I am coming up with now is the clobber_netcdf which doesn't really supercede another parameter per-say, and we don't do any funny business with the subsidence settings...

Also, importantly, what happens in the case where I explicitly set both checkpoint_dt: 100, save_checkpoint: False? Maybe I ran the model to a point where I am happy with the checkpointed state, and I want to experiment with a few different conditions following that state, but not overwrite the checkpoint file (e.g., a spin-up run). Do I need to also remove the checkpoint_dt?

Ya this is a good point, I agree. Maybe the solution is to not change the behavior or modify save_checkpoint in any way but instead just throw a warning when checkpoint_dt is explicitly specified but checkpoints will not actually get saved because save_checkpoint is False and vise-versa. That would at least let the user know upon initialization that save_checkpoint and checkpoint_dt are somewhat in conflict without modifying what the model does.

This is an actual scenario I am working now, where I use the same "base yaml" to run a spin-up and a matrix of experiments following that resumed state. I think we need to be sure to honor the save_checkpoint: False if it is stated explicitly.

This is interesting and I'd never considered this use-case. Might ask you later about how you're doing that without running into some sort of out_dir / output file issue. I'd naively thought that to do that sort of experimentation you would want to copy the netCDF file and checkpoint data to experimental folders (which is why I probably didn't think this change would pose a problem).

amoodie commented 3 years ago

Sorry I kind of lost where we are on this one. I checked it out on an example that I think verifies it will work for whatever I need, so I think it's good.

If you're good, merge when ready :+1:

elbeejay commented 3 years ago

Sorry I kind of lost where we are on this one. I checked it out on an example that I think verifies it will work for whatever I need, so I think it's good.

If you're good, merge when ready

Cool, I guess I'm still thinking about whether or not the added complexity to the checkpointing yaml parameters is necessary. The netCDF clobber parameter seems like a reasonable thing to implement, but I am trying to decide if I should just unwind the checkpointing changes.

elbeejay commented 3 years ago

Rewound this to just implement the check and toggle for netCDF clobbering.

We can revisit the saving parameters/methods later if we find there is a need to do so.

DeltaRCM / pyDeltaRCM

Overwrite 'n save #186

PR to modify overwriting and checkpoint saving behavior

Codecov Report