New restart does not work for timestep 0

timfelle commented 9 months ago

It would seem the very first checkpoint is not saved correctly.

Setting a very small example to save checkpoints at all time steps and attempting to restart to the very first checkpoint cause the solver to diverge.

Loading the checkpoint "fluid00000.chkp" fails, where "fluid00001.chkp" succeeds.

TGV Case File Failing

``` { "version": 1.0, "case": { "mesh_file": "512.nmsh", "output_boundary": true, "output_checkpoints": true, "checkpoint_control": "tsteps", "checkpoint_value": 1, "restart_file": "fluid00000.chkp", "output_at_end": true, "load_balance": false, "job_timelimit": "00:00:00", "end_time": 0.1, "timestep": 5e-3, "numerics": { "time_order": 3, "polynomial_order": 7, "dealias": true }, "fluid": { "scheme": "pnpn", "Re": 360, "initial_condition": { "type": "user" }, "velocity_solver": { "type": "cg", "preconditioner": "jacobi", "projection_space_size": 0, "absolute_tolerance": 1e-7, "max_iterations": 800 }, "pressure_solver": { "type": "gmres", "preconditioner": "hsmg", "projection_space_size": 20, "absolute_tolerance": 1e-7, "max_iterations": 800 }, "output_control": "simulationtime", "output_value": 0.05 }, "simulation_components": [ { "type": "vorticity", "compute_control": "tsteps", "compute_value": 50 } ] } } ```

TGV Case File Succeeding

``` { "version": 1.0, "case": { "mesh_file": "512.nmsh", "output_boundary": true, "output_checkpoints": true, "checkpoint_control": "tsteps", "checkpoint_value": 1, "restart_file": "fluid00001.chkp", "output_at_end": true, "load_balance": false, "job_timelimit": "00:00:00", "end_time": 0.1, "timestep": 5e-3, "numerics": { "time_order": 3, "polynomial_order": 7, "dealias": true }, "fluid": { "scheme": "pnpn", "Re": 360, "initial_condition": { "type": "user" }, "velocity_solver": { "type": "cg", "preconditioner": "jacobi", "projection_space_size": 0, "absolute_tolerance": 1e-7, "max_iterations": 800 }, "pressure_solver": { "type": "gmres", "preconditioner": "hsmg", "projection_space_size": 20, "absolute_tolerance": 1e-7, "max_iterations": 800 }, "output_control": "simulationtime", "output_value": 0.05 }, "simulation_components": [ { "type": "vorticity", "compute_control": "tsteps", "compute_value": 50 } ] } } ```

njansson commented 6 months ago

Is this still an issue after all the recent restart work?

MartinKarp commented 6 months ago

Does it make sense to restart when starting the simulation? I guess I don't see where one would do this for the first time step. Currently, I think the restart kind of assumes that the lagged arrays are valid, so if you have not progressed more than 2 time steps this is not well defined.

timfelle commented 6 months ago

Does it make sense to restart when starting the simulation? I guess I don't see where one would do this for the first time step. Currently, I think the restart kind of assumes that the lagged arrays are valid, so if you have not progressed more than 2 time steps this is not well defined.

Well if that is the case, then i would suggest checking the timestep before exporting the checkpoint and just not output a file that is broken.

MartinKarp commented 6 months ago

Yeah, I think the timestep number is something that is not passed down to the outputs currently. Maybe adding this would be good though.

However I think the time step number is reset upon restart, so if someone would want to output a restart directly after a restart this would not be possible if we added such a check.

I guess the only way to control it would be to check the number of lagged arrays and pass this information down into the sampler.

Or I guess the most reasonable, update the restart file again to not assume how many lagged arrays are in it.

timfelle commented 6 months ago

Yeah, I think the timestep number is something that is not passed down to the outputs currently. Maybe adding this would be good though.

However I think the time step number is reset upon restart, so if someone would want to output a restart directly after a restart this would not be possible if we added such a check.

I guess the only way to control it would be to check the number of lagged arrays and pass this information down into the sampler.

Or I guess the most reasonable, update the restart file again to not assume how many lagged arrays are in it.

What i meant was just when saving the checkpoint, make sure time is not 0. It would fix the confusion of a checkpoint file being there which cannot be used.

ExtremeFLOW / neko

New restart does not work for timestep 0 #1063