Clarify distinction between `Stage` and `Policy`

bruno-f-cruz commented 4 months ago

Policy and Stage remain one of the most subtle concepts in the present library. We should think about a better way to distinguish them. Here's I am currently seeing these two concepts/implementations, feel free to use this thread to discuss and hopefully spin a pull-request to the docs.

On the curriculum

As stage on the current docs:

A Curriculum is structured as a graph of training Stage. Each Stage is associated with a Task, which defines a set of configuration parameters via TaskParameters. Stages are connected by StageTransition, which are directed edges associated with a trigger condition.

In other words, a Curriculum, in its simplest form, can be seen as a container of Stages + any logic associated to the transition between them (StageTransitions).

Stage is, in turn, a "container" of a Task instance. The concept of an "instance" is critical here. While the Task object defines a class, Stage works by wrapping an instance. What this means is that two distinct Stages A and B, could actually implement the same Task type BUT different instances of the same Task (Think about two different training stages of the same task implementation).

Let's step back a bit here and think about what affordances we have at this point.

1 - In each Stage we can define a set of parameters for a specific Task 2 - We can express transition logic that transitions between Stages 3 - We can have Stages of different Tasks or the same Task

Lets look at two ways this could be used (I am going to use pseudo-code for brevity)

Curriculum within the same Task

Let's consider a Task named VrForaging. The experimenter has 2 distinct and discrete operation modes that the animal transitions through training. Let's call them: ForagingForApples and ForageForCheese. The animal starts training on ForagingForApples and after it fulfills a transition criteria (say collects 100 rewards in a behavior session), graduates to ForageForCheese.

Curriculum across Tasks

Let's consider a curriculum where the animal is first trained in the previous task VrForaging and Stage ForageApples. Now, instead of graduating to ForageForcheese, we want to teach the mouse a completely new task, say DynamicForaging. This new task also has stages, say BaitedStage.

As you can see, a Stage is always required. It is the materialization of a Task class/type.

But what about policies?

Consider the previous scenario where the animal is in Task VrForaging and Stage ForagingForApples. Additionally, consider a variable inside the task that depicts the amount of reward the animal receives. Say the experimenter wants to automatically update this value based on the amount of water the animal drank in the previous session.

We can express this logic using Stages by considering:

ForagingForApples(5ul) -> ForagingForApples(6ul) if WaterDrank > 1000 & WaterDrank< 1200;
ForagingForApples(5ul) -> ForagingForApples(7ul) if WaterDrank > 800  & WaterDrank < 1000;
ForagingForApples(5ul) -> ForagingForApples(8ul) if WaterDrank > 600  & WaterDrank < 800;
ForagingForApples(5ul) -> ForagingForApples(9ul) if WaterDrank > 400  & WaterDrank < 600;

While possible this is unnecessarily annoying as we would need to code a large number of stage transitions between all the stages.

To meet this need, we can instead use a Policy. A Policy can be thought of as a set of functions that run on top of the session outcome. For instance, the above scenario could be instead coded in the continuous domain by:

ForagingForApples(x), where x = WaterDrank / 2000

In this example, only one policy is active. However, multiple policies can be active simultaneously that are used to update different set of parameters.

In other words, a Stage, on top of being a simple container for a Task instance, also defines a set of policies. As a result, two stages can define the same Task, but differ solely on the underlying policies. (For instance, one Stage could update the WaterDrank by a factor of 1/2000 whereas the other may do it by a different factor of 1/4000)

Why Policy transitions?

The previous architecture already affords vast flexibility. However, one thing that becomes very difficult is to independently control active policies. Imagine that you have two concurrently active policies (UpdateWater and UpdateDistance). At some point, you want to change the UpdateWater to UpdateWaterByAlot. This could be done by coding an extra stage, as mentioned before. Unfortunately, we need to account for UpdateDistance too! Couldn't we just add this Policy to the new stage? You could, but what happens if UpdateDistance has a corresponding UpdateDistanceByAlot too? Now we suddenly need to expand our number of stages to account for all possible pair-wise combinations of active policies.

To solve this, each Stage can instead also code its own set of PolicyTransitions that define how different policies transition into others. This allows different policies to be concurrently active within the Stage while simultaneously allow them to be updated to different ones independently.

jeromelecoq commented 4 months ago

Just to follow up. This is great! Thanks @bruno-f-cruz

At this moment, my understanding of policy in the case of foraging, would be "context".

For example, we try to enforce a different overall state in the animal whether by moving from a "Forest" to a "Desert" where the local rules are slightly different but would have the same logic functions (ie we change parameters but not the functions that use those parameters).

When I go through the example : https://github.com/AllenNeuralDynamics/aind-behavior-curriculum/blob/bb28161608b4544e5f71e23c2190a6682f17dcaf/examples/example_project/curriculum.py#L179

Implementing a Policy seems a bit counter-intuitive. I would have thought you would do this by adding a layer on top of curriculum or somehow connecting multiple curriculum.

This is because in my mind the simpler case it not to have any context and live in an homogenous world.

In the current logic, it seems like both stage transitions and policy transitions are using almost exactly the same functions for creating the objets. So it feels like implementing several policy is still a bit complicated: You have to give how each sub thing changes one by one while the purpose is to change a bunch together. Do you see what I mean?

bruno-f-cruz commented 4 months ago

Implementing a Policy seems a bit counter-intuitive. I would have thought you would do this by adding a layer on top of curriculum or somehow connecting multiple curriculum.

This is a bit of a non-starter, as Policies and PolicyTransitions depend on Metrics, which are tied to the Task. As in, different Task have different Metrics. Having policies above curriculum would not work unless you force users to use common interfaces in Metrics. Which I think would be a pain...

This is because in my mind the simpler case it not to have any context and live in an homogenous world. Isnt this just a strict sub-set of what we have?

Sounds like a Curriculum with a single Stage and your policies defining an "homogenous world", in that they define how settings are updated between sessions. Maybe I am missing some detail? would you mind adding an explicit example of what you are trying to achieve?

In the current logic, it seems like both stage transitions and policy transitions are using almost exactly the same functions for creating the objets. So it feels like implementing several policy is still a bit complicated: You have to give how each sub thing changes one by one while the purpose is to change a bunch together. Do you see what I mean?

Can you clarify what you mean by "change a bunch together?". The syntax is similar because the implementation is similar (BehaviorGraph). They however afford different functionality, Stages let you define a set of different active update policies, but not vice-versa. If you just want to have a single set of active policies, simply use a single Stage ?

hanhou commented 4 months ago

I'm trying to explain things in the opposite direction here -- from policy to stage.

Co-activated policies

In my mind, the two most important motivations to have co-activated policies over discrete stages--as in my v1.0 implementation (and perhaps also mTrain?)--are:

parameters can be updated continuously (property of "policy")
task dimensions can be decoupled (property of "co-activation")

Here is an example for dynamic foraging, showing a special (but maybe the most important) use case where each family of policies controls each task dimension.

Policy transitions

Naturally, we should be able to change the policy itself, i.e., policy transitions. With policies and their transitions, we can effectively mix together 1. continuous parameter changes, 2. discrete parameter changes, and 3. discrete changes of the rules that govern 1 and 2.

This is an example of a 'Train Track' Curriculum in the doc.

Stage and stage transitions

Why do we wrap another layer of Stage on top of Policy?

As mentioned above, Policy effectively decouples parameters of a task, but in some cases we still want them to be coupled ("change a bunch together"), especially when we have a discrete change of the subtask (Bruno's "Curriculum within the same Task" example) or task (Bruno's "Curriculum across Tasks" example). Note that changing a Task could be much more dramatic, since different Tasks may have totally different parameters and metrics spaces. In other words, we definitely need a new Stage.

Compare with v1.0 implementation (and mTrain?): different interpretations

In my v1.0 implementation, I only have discrete "stages" where all parameters are coupled together:

This can be seen as a special case of the new system, but with different interpretations, two of which correspond to Jerome's "two worlds".

(all policies in this example are degenerated in the sense that they just set all parameters to fixed values without any actual updates)

They all make sense, depending on how you define subtask or task. But technicailly speaking, my example "Uncoupled Baiting" fits best to Interpretation C, as the subtask is changed from "Coupled Baiting" to "Uncoupled Baiting" from my old "STAGE_2" to "STAGE_3".

hanhou commented 4 months ago

Add Sue here since she will be one of the first users @ZhixiaoSu

bruno-f-cruz commented 4 months ago

@jeromelecoq @ZhixiaoSu After using the curriculum, it would be amazing if you could make a PR clarifying some of these points according to your experience. I would be happy to review it!

mochic commented 4 months ago

Hey, so I've begun reimplementing the dynamic routing task and opened an issue outlining my current user experience: https://github.com/AllenNeuralDynamics/aind-behavior-curriculum/issues/34. In my initial experience policies seemed like an unnecessary addition but they are almost fully "opt-in" and I've implemented most of my task without using them. Now that I've made a regimen without them, I can see how they're very useful to making the circular stage transitions I have cleaner and probably would allow us to go from 8 stages defined to like 3 or so.

bruno-f-cruz commented 4 months ago

We should get rid of that "almost" then :p. That goal was to not ask users to worry about policies if they don't want to use them, as I sketched above. Glad it's close!

mochic commented 4 months ago

https://github.com/AllenNeuralDynamics/aind-behavior-curriculum/blob/bb28161608b4544e5f71e23c2190a6682f17dcaf/src/aind_behavior_curriculum/curriculum.py#L588 I'm just starting to use this and maybe I'm doing things wrong but maybe create_empty_stage should be the default then?

bruno-f-cruz commented 4 months ago

I think we should think of a way to have a default constructor for stage that automatically takes care of the policy dependencies in the background. Alternatively, this could also be solved at the level of the trainer, by having a special case when no policies are provided.

Lets try to collect a bit more feedback and have a quick meeting in a few weeks to discuss some of these points!

AllenNeuralDynamics / aind-behavior-curriculum