Currently we store "current_point", "current_target_eval" etc. as seperate attributes often as CUQIarray even if scalar in the new experiemental MCMC module. This is not good design and will likely even create bugs. As suggested in #356, let us refactor this behavior of the sampler "state" (e.g. all variables that define the current state) and design a good implementation.
On a related note we also have the sampler history e.g. "samples", "acceptance rate" etc.
Both state and history need to have a good approach for loading and saving. This is currently implemented as "checkpoint" for state and "batch" for history. But it needs a proper re-design.
Here are some points to address for this refactor as discussed in #390.
What exactly constitutes "state" (what does it represent conceptually and what is the minimal representation).
Do different samplers require different information in state? If so, which aspects are shared among all samplers?
Can one use a state from one sampler for another sampler instance, of the same type, and of completely different type?
Considerations around choice of data structure to represent state - why the need for an internal data structure and a different one facing users? Why not a new class to represent state?
How can a user know which keys/attributes, and their type, to equip a newly created state instance with to meaningfully create a working sampler instance from it?
For example the docstring of get_state says "For example, the state of a "MH" sampler could be:" then lists a state which ends with "..." suggesting that more elements could be present. Should a particular sampler type like MH not have a well-defined set of properties that makes it a complete and non-redundant state?
Why the separation into metadata and state? It seems a bit clunky to have state inside state.
What is the difference between state and checkpoint? The comment in the demo explains "Checkpoint uses the set_state and get_state methods to save and load state and then pickles the state." which makes me wonder about the need to distinguish, why not "save_state" instead of "save_checkpoint"?
Similarly the requirements, usage and naming around "batch sampling", also referred to as "chunks" in the comments, should be clarified. How does one operate this in common use cases, for example reloading chunks and combining into one samples object for later analysis?
Currently we store "current_point", "current_target_eval" etc. as seperate attributes often as CUQIarray even if scalar in the new experiemental MCMC module. This is not good design and will likely even create bugs. As suggested in #356, let us refactor this behavior of the sampler "state" (e.g. all variables that define the current state) and design a good implementation.
On a related note we also have the sampler history e.g. "samples", "acceptance rate" etc.
Both state and history need to have a good approach for loading and saving. This is currently implemented as "checkpoint" for state and "batch" for history. But it needs a proper re-design.
Here are some points to address for this refactor as discussed in #390.
Allow Save State and Save History in HybridGibbs