dynamicslab / hydrogym

An RL-Gym for Challenge Problems in Data-Driven Modeling and Control of Fluid Dynamics.
https://hydrogym.readthedocs.io
MIT License
49 stars 10 forks source link

Snapshot handling #31

Open jcallaham opened 2 years ago

jcallaham commented 2 years ago

Found some slightly confusing behavior with handling saving and loading functions on meshes from different scripts. Basically I think that if you load checkpoint files from two different scripts which were run with different numbers of processors Firedrake considers them to have come from different meshes, even if they were created from the same gmsh file originally.

The behavior is something like this (just a cartoon though):

Script A (serial):

# ... Do some analysis
save_checkpoint('chkA.h5', qA)

Script B (parallel):

qA = load_checkpoint('chkA.h5')
qB = qA.copy()
save_checkpoint('chkB.h5', qB)

Script C (serial):

qA = load_checkpoint('chkA.h5')
qB = load_checkpoint('chkB.h5')

# Now these are incompatible
inner(qA, qB)

One fix is to re-save qA from Script B so that the checkpoint files were created from the same script. But this would be a pain if you were doing some complicated analysis, say comparing projections onto global modes and POD modes with snapshots from two different simulations.

I think the better way to handle it would be to set it up to distinguish between "restart" checkpoints and "snapshot" checkpoints. So the default behavior would be to use numpy binaries as the intermediate for working with snapshots, and CheckpointFile for restart files. The only catch is working in parallel... the CheckpointFile can be saved without a bottleneck, but converting to numpy arrays (currently in utils.snapshots_to_array) has to be done in serial. Currently (for POD, for instance), I'm using an intermediate to_arrays.py script to do this, but obviously this is pretty confusing and not idea.

I think a better way would be to have the SnapshotCallback call out a subprocess from rank zero that will do the conversion as a postprocessing step, and then to retool some of the other analysis features so that saving and loading to numpy binaries is more easily supported.

It would still be ideal to be able to gather the PETSc.Vec to rank zero, but still no luck on that front...

jcallaham commented 2 years ago

This should also be documented somewhere before closing

jcallaham commented 2 years ago

Looks like one workaround is to use fd.project to convert between meshes:


flow = gym.flow.Cylinder(h5_file=restart)
with fd.CheckpointFile(filename, 'r') as file:
    mesh = file.load_mesh('mesh')
    q = file.load_function(mesh, 'q') # Function on different mesh
    u, p = q.split()
    flow.u.assign(fd.project(u, flow.velocity_space))
    flow.p.assign(fd.project(p, flow.pressure_space))