ForestClaw / forestclaw

Quadtree/octree adaptive PDE solver based based on p4est.
http://www.forestclaw.org
BSD 2-Clause "Simplified" License
57 stars 21 forks source link

Feature restart #336

Closed scottaiton closed 2 weeks ago

scottaiton commented 3 weeks ago

This adds restart functionality to ForestClaw.

Options

There are four new fclaw options:

When writing the checkpoint and partition files will be named in the format fort_frame_####.checkpoint and fort_frame_####.partition. I chose file extensions that are dimension independent mainly because of MAGIC and GEMINI. Since they both have dimension as a user options, I think this will ultimately help avoid confusion.

Changes needed in user applications

To enable the use of this feature, two changes need to be made to user applications.

In run_program,

    fclaw_initialize(glob);

needs to change to

    fclaw_options_t *fclaw_opt = fclaw_get_options(glob);
    if(fclaw_opt->restart)
    {
        fclaw_restart(glob);
    }
    else
    {
        fclaw_initialize(glob);
    }

Also, the glob constructor needs to change from

        fclaw_global_t *glob = fclaw_global_new_comm (mpicomm, size, rank);

to

        fclaw_global_t *glob = fclaw_global_new(app);

Implementation Details

Dimension Independent fclaw_file interface

This adds a dimension independent file interface with fclaw_file.h. This ended up being fairly lightweight, with just if statements branching out into the 2d and 3d code, and some helper functions to handle return values.

Options

For options checking on restart, the user has to start with the correct options. Upon reading the restart file, the options are then compared to a set of options in the checkpoint file.

To implement this, I scrapped my first idea of having to write routines to pack/unpack options data in a buffer and instead rely on the ini file produced by the sc_options structure. An ini file is written to fclaw_options.ini.checkpoint and then the contents of that file is then saved to the checkpoint file. On restart, that ini file is extracted from the checkpoint file and then is used to compare with the options being run with.

Changes to fclaw_global

An attributes feature has been added to fclaw_global. Attributes can be packed into the checkpoint file. When adding an attribute, there is an argument where a packing vtable can be specified. If no vtable is specified, then that attribute is not packed into the checkpoint file.

Changes to fclaw_run

Some modifications were needed to get fclaw_run to be compatible with restarting.

First of all, some variables need to be saved. A new fclaw_context class has been added. These contexts get stored as attributes in glob that get packed into the restart file. See fclaw_context.h and fclaw_run.c for documentation and usage examples.

Secondly, fclaw_output_checkpoint(glob, iframe); needs to be called. Currently a checkpoint written whenever visualization data is written. We may want to change this in the future, but this works for now.

Patch vtable

In order to virtualize the writing of patch data into the checkpoint file, the following functions were added to the patch table:

/**
 * @brief Get the number of pointers to store in the checkpoint
 * 
 * @param glob the global context
 * @return int the number of restart pointers
 */
int fclaw_patch_checkpoint_num_pointers(struct fclaw_global* glob);

/**
 * @brief Get the sizes of the checkpoint data
 * 
 * @param[in]  glob the global context
 * @param[out] restart_sizes an array of length ::fclaw_patch_restart_num_pointers 
 *                           with the sizes of the restart data
 */
void fclaw_patch_checkpoint_pointer_sizes(struct fclaw_global* glob, size_t restart_sizes[]);

/**
 * @brief Get the names of the checkpoint data.
 * 
 * @param glob the global context 
 * @return sc_array_t* an array of strings
 */
void fclaw_patch_checkpoint_names(struct fclaw_global* glob, const char *names[]);

/**
 * @brief Get a specific pointer for the checkpoint
 * 
 * @param glob the global context
 * @param this_patch the patch context
 * @param blockno the block number
 * @param patchno the patch number
 * @param pointerno the pointer number
 * @return void* the pointer
 */
void *fclaw_patch_checkpoint_get_pointer(struct fclaw_global* glob,
                                         struct fclaw_patch* this_patch,
                                         int blockno,
                                         int patchno,
                                         int pointerno);

Currently only q is written to the checkpoint, but these should allow for this to be extended in the future.