Closed nabriis closed 7 months ago
Hi @nabriis, thank you for this, it is looking promising. Like @amal-ghamdi I am wondering about the choice of data structure, whether a standard type like python set could be used.
More generally, I was trying to work out how users should be able to interact with the state of samplers. I think they should be able to easily view the current state of a sampler. Load a state from disk, view it, perhaps modify, then set up a sampler with that state. Perhaps even create a state from scratch (with easily seeing the expected entry types) and create a sampler. I couldn't really see whether this was possible. Is there a relevant demo outlining intended usage possibilities, to look at or can one be created?
It may be helpful to write out design/requirements to help see what the best data structure would be.
I took your comment into account. The underlying datastructure to store the state internally in the sampler was updated to a set. The user is NOT exposed to this.
What the user is exposed to the what they can get from "get_state" and "set_state". These are dictionaries, and there is now more documentation on them.
The dictionary datastructure is primarily useful because it is easy to extend what it can store (what keys it contains) and is a well-known datastructure in python.
I also updated the demo36 file to talk about state and connect it with checkpointing.
Thank you for the update @nabriis incl more documentation of get_state and set_state and for adding example usage in the demo. As you pointed out in #379 we want to make sure at the time of this refactoring to design a good implementation as the current one needs a proper re-design. We may be there with this PR but I can't quite see if that is the case based on the code provided.
I would therefore like to return to my recommendation of writing out the design and requirements of state, checkpointing, batch saving etc. To move forward, and to avoid introducing technical debt, I would like you to prepare a document describing the requirements in more detail along with the suggested design and underlying data structures.
Points that are a somewhat unclear to me at this point could be clarified there (for sure not an exhaustive list):
I think that writing this out will help us make sure we get more quickly to the desired design. This document can later be turned into documentation and help us explain to our users what state etc. is and how to operate these tools.
Thanks! Please, for reference, add explanation as comment here on the PR on:
Thanks! Please, for reference, add explanation as comment here on the PR on:
- what you have done now in response to the latest comments and our discussions and decision around it,
- what the latest commits seek to do, and
- create an issue with explanation and the comments raised that have not been addressed by this PR and refer to it from here.
Hello @jakobsj.
Thanks for requesting a clarification for future reference. I updated the issue #402 with a description of the sought change and updated this PR description with more details on the aim of this PR. Additionally, I included your comments into #379.
For reference this PR seeks to iterate on the current design by moving the get_state and set_state methods to the base class, not refactor the design in its entirety. Let us discuss that refactor in #379.
Finally the latest commits come from a rebase on main (i.e. they are a replay of the preivous commits in this PR, on top of the new main) after the LinearRTO and RegularizedLineaRTO merge. After the rebase I added 3 small commits to get these samplers into to the test framework.
As described in #402 this PR seeks to implement get_state, set_state in the base class ´SamplerNew`.
This is achieved by adding a class variable
_STATE_KEYS
that contain a set of keys representing the state variables the sampler will work with when calling the already implementedget_state
andset_state
methods. This reduces the amount of repeated code in each subclass.Subclassing samplers must now optionally update
_STATE_KEYS
if they expand on the set of variables the represent the state. For example theProposalBasedSampler
sampler can expand this by including thescale
key, as this sampler maintains ascale
attribute that is updated when running warmup sampling.A testing framework is added get_state, set_state and checkpointing.
Additionally the
_HISTORY_KEYS
class variable is also added following the same design, but containing variable that representing the sampling history like the samples and acceptance criteria. This specific change introduces public methods get_history and set_history working similarly to how get_state and set_state already works before this PR.Closes #402
The following related issue #379 is about refactoring the general behavior of get_state and set_state. This current implemented changes introduce 2 class variables that can relatively easily be replaced with another design once we agree on something else for #379.