Open bruno-f-cruz opened 4 months ago
Just to follow up. This is great! Thanks @bruno-f-cruz
At this moment, my understanding of policy in the case of foraging, would be "context".
For example, we try to enforce a different overall state in the animal whether by moving from a "Forest" to a "Desert" where the local rules are slightly different but would have the same logic functions (ie we change parameters but not the functions that use those parameters).
When I go through the example : https://github.com/AllenNeuralDynamics/aind-behavior-curriculum/blob/bb28161608b4544e5f71e23c2190a6682f17dcaf/examples/example_project/curriculum.py#L179
Implementing a Policy seems a bit counter-intuitive. I would have thought you would do this by adding a layer on top of curriculum or somehow connecting multiple curriculum.
This is because in my mind the simpler case it not to have any context and live in an homogenous world.
In the current logic, it seems like both stage transitions and policy transitions are using almost exactly the same functions for creating the objets. So it feels like implementing several policy is still a bit complicated: You have to give how each sub thing changes one by one while the purpose is to change a bunch together. Do you see what I mean?
Implementing a Policy seems a bit counter-intuitive. I would have thought you would do this by adding a layer on top of curriculum or somehow connecting multiple curriculum.
This is a bit of a non-starter, as Policies and PolicyTransitions depend on Metrics, which are tied to the Task. As in, different Task have different Metrics. Having policies above curriculum would not work unless you force users to use common interfaces in Metrics. Which I think would be a pain...
This is because in my mind the simpler case it not to have any context and live in an homogenous world. Isnt this just a strict sub-set of what we have?
Sounds like a Curriculum with a single Stage and your policies defining an "homogenous world", in that they define how settings are updated between sessions. Maybe I am missing some detail? would you mind adding an explicit example of what you are trying to achieve?
In the current logic, it seems like both stage transitions and policy transitions are using almost exactly the same functions for creating the objets. So it feels like implementing several policy is still a bit complicated: You have to give how each sub thing changes one by one while the purpose is to change a bunch together. Do you see what I mean?
Can you clarify what you mean by "change a bunch together?". The syntax is similar because the implementation is similar (BehaviorGraph
). They however afford different functionality, Stages let you define a set of different active update policies, but not vice-versa. If you just want to have a single set of active policies, simply use a single Stage ?
I'm trying to explain things in the opposite direction here -- from policy to stage.
In my mind, the two most important motivations to have co-activated policies over discrete stages--as in my v1.0 implementation (and perhaps also mTrain?)--are:
Here is an example for dynamic foraging, showing a special (but maybe the most important) use case where each family of policies controls each task dimension.
Naturally, we should be able to change the policy itself, i.e., policy transitions. With policies and their transitions, we can effectively mix together 1. continuous parameter changes, 2. discrete parameter changes, and 3. discrete changes of the rules that govern 1 and 2.
This is an example of a 'Train Track' Curriculum
in the doc.
Why do we wrap another layer of Stage
on top of Policy
?
As mentioned above, Policy
effectively decouples parameters of a task, but in some cases we still want them to be coupled ("change a bunch together"), especially when we have a discrete change of the subtask (Bruno's "Curriculum within the same Task" example) or task (Bruno's "Curriculum across Tasks" example). Note that changing a Task
could be much more dramatic, since different Task
s may have totally different parameters and metrics spaces. In other words, we definitely need a new Stage
.
In my v1.0 implementation, I only have discrete "stages" where all parameters are coupled together:
This can be seen as a special case of the new system, but with different interpretations, two of which correspond to Jerome's "two worlds".
(all policies in this example are degenerated in the sense that they just set all parameters to fixed values without any actual updates)
They all make sense, depending on how you define subtask or task. But technicailly speaking, my example "Uncoupled Baiting" fits best to Interpretation C, as the subtask is changed from "Coupled Baiting" to "Uncoupled Baiting" from my old "STAGE_2" to "STAGE_3".
Add Sue here since she will be one of the first users @ZhixiaoSu
@jeromelecoq @ZhixiaoSu After using the curriculum, it would be amazing if you could make a PR clarifying some of these points according to your experience. I would be happy to review it!
Hey, so I've begun reimplementing the dynamic routing task and opened an issue outlining my current user experience: https://github.com/AllenNeuralDynamics/aind-behavior-curriculum/issues/34. In my initial experience policies seemed like an unnecessary addition but they are almost fully "opt-in" and I've implemented most of my task without using them. Now that I've made a regimen without them, I can see how they're very useful to making the circular stage transitions I have cleaner and probably would allow us to go from 8 stages defined to like 3 or so.
We should get rid of that "almost" then :p. That goal was to not ask users to worry about policies if they don't want to use them, as I sketched above. Glad it's close!
https://github.com/AllenNeuralDynamics/aind-behavior-curriculum/blob/bb28161608b4544e5f71e23c2190a6682f17dcaf/src/aind_behavior_curriculum/curriculum.py#L588 I'm just starting to use this and maybe I'm doing things wrong but maybe create_empty_stage should be the default then?
I think we should think of a way to have a default constructor for stage that automatically takes care of the policy dependencies in the background. Alternatively, this could also be solved at the level of the trainer, by having a special case when no policies are provided.
Lets try to collect a bit more feedback and have a quick meeting in a few weeks to discuss some of these points!
Policy
andStage
remain one of the most subtle concepts in the present library. We should think about a better way to distinguish them. Here's I am currently seeing these two concepts/implementations, feel free to use this thread to discuss and hopefully spin a pull-request to the docs.On the curriculum
As stage on the current docs:
In other words, a
Curriculum
, in its simplest form, can be seen as a container ofStages
+ any logic associated to the transition between them (StageTransitions
).Stage
is, in turn, a "container" of aTask
instance. The concept of an "instance" is critical here. While theTask
object defines a class,Stage
works by wrapping an instance. What this means is that two distinct Stages A and B, could actually implement the sameTask
type BUT different instances of the sameTask
(Think about two different training stages of the same task implementation).Let's step back a bit here and think about what affordances we have at this point.
1 - In each
Stage
we can define a set of parameters for a specificTask
2 - We can express transition logic that transitions betweenStages
3 - We can haveStages
of differentTask
s or the sameTask
Lets look at two ways this could be used (I am going to use pseudo-code for brevity)
Curriculum within the same Task
Let's consider a Task named VrForaging. The experimenter has 2 distinct and discrete operation modes that the animal transitions through training. Let's call them:
ForagingForApples
andForageForCheese
. The animal starts training onForagingForApples
and after it fulfills a transition criteria (say collects 100 rewards in a behavior session), graduates toForageForCheese
.Curriculum across Tasks
Let's consider a curriculum where the animal is first trained in the previous task VrForaging and Stage
ForageApples
. Now, instead of graduating toForageForcheese
, we want to teach the mouse a completely new task, sayDynamicForaging
. This new task also has stages, sayBaitedStage
.As you can see, a Stage is always required. It is the materialization of a Task class/type.
But what about policies?
Consider the previous scenario where the animal is in Task
VrForaging
and StageForagingForApples
. Additionally, consider a variable inside the task that depicts the amount of reward the animal receives. Say the experimenter wants to automatically update this value based on the amount of water the animal drank in the previous session.We can express this logic using
Stages
by considering:While possible this is unnecessarily annoying as we would need to code a large number of stage transitions between all the stages.
To meet this need, we can instead use a
Policy
. A Policy can be thought of as a set of functions that run on top of the session outcome. For instance, the above scenario could be instead coded in the continuous domain by:ForagingForApples(x), where x = WaterDrank / 2000
In this example, only one policy is active. However, multiple policies can be active simultaneously that are used to update different set of parameters.
In other words, a Stage, on top of being a simple container for a
Task
instance, also defines a set of policies. As a result, two stages can define the same Task, but differ solely on the underlying policies. (For instance, one Stage could update the WaterDrank by a factor of 1/2000 whereas the other may do it by a different factor of 1/4000)Why Policy transitions?
The previous architecture already affords vast flexibility. However, one thing that becomes very difficult is to independently control active policies. Imagine that you have two concurrently active policies (UpdateWater and UpdateDistance). At some point, you want to change the UpdateWater to UpdateWaterByAlot. This could be done by coding an extra stage, as mentioned before. Unfortunately, we need to account for UpdateDistance too! Couldn't we just add this Policy to the new stage? You could, but what happens if UpdateDistance has a corresponding UpdateDistanceByAlot too? Now we suddenly need to expand our number of stages to account for all possible pair-wise combinations of active policies.
To solve this, each Stage can instead also code its own set of PolicyTransitions that define how different policies transition into others. This allows different policies to be concurrently active within the Stage while simultaneously allow them to be updated to different ones independently.