Grid2op / grid2op

Grid2Op a testbed platform to model sequential decision making in power systems.

https://grid2op.readthedocs.io/

Mozilla Public License 2.0

296 stars 117 forks source link

Node-wise ActionConverter for environments with different action spaces #419

Closed EloyAnguiano closed 8 months ago

EloyAnguiano commented 1 year ago

Description

I am trying something a little bit challenging for a Curricullum Learning (learn from different environments) approach with these environments. I`ve managed to resolve the issue of different observation spaces as I mask into the model the observation space for each environment and it should work. However, my goal is to do the same with the action space. I'm trying to make a common action space for any two given grid environments.

Possible solution

As now I'm able to mask part of the connections of the NN architecture, my firts approach is to make a fixed action space as the maximun number of nodes of the graph (let's say its MAX_NODES) and each of that possible action should be an enconing of the possible actions perfomable on a node (lets say Its 5 dimensional). As far as I am concerned, I should make a gym.BoxSpace(5, MAX_NODES) and a new class that inherits from grid2op.ActionConverter that contains the logic of using only the first (5,M)vectors (M should be lesser or equal to MAX_NODES) if that observation has M nodes at the graph. For this approximation I'm taking as a node the same that the method obs.as_networkx() uses as node, that is each of the bus connections on a substation.

It can be seen as the output of a generative LLM, where each of the nodes is a step in the language sequence and MAX_NODES is the same as MAX_SEQ_LENGTH.

The code to use this converter should be: gym_env.action_space = BoxGymNodeWiseActSpace(grid2op_env.action_space, MAX_NODES)

Describe alternatives you've considered

@BDonnot suggested these alternatives, but I cannot see how to make them work.

Filtering out some not desirable actions in the DiscreteActSpace (I assume you want to manipulate actions converted to integer) This may be desirable, but the same agent cannot learn from a 14 substation environment and from a 118 substation one. My request is trying that if the environment chosen at this step has 14 substation (and maybe 17 nodes as each substation can be splitted), the converter only applies the first 17 latent actions, one for each node.
Act only on some substations, in this case it can be related to "multi agent" (a feature that is not yet finalized but that you can test) I do not underestand this.

BDonnot commented 1 year ago

Thanks for putting this here :-)

Concerning:

Act only on some substations, in this case it can be related to "multi agent" (a feature that is not yet finalized but that you can test)

The general idea is that, a "118 bus" can be seen as 9 14 buses network connected together. You can train an agent on a 14 buses, and then apply it "8 times" (one per area of the 118) to control the full 118.

This might not be exactly what you thought, I don't really know, but it might work. It's a concept similar to the "divide and conquer" used pretty much everywhere in computer science.

The "multi agent" feature lets you have distinct agent operating the grid at the same time.

Concerning:

Filtering out some not desirable actions in the DiscreteActSpace (I assume you want to manipulate actions converted to integer)

This may be desirable, but the same agent cannot learn from a 14 substation environment and from a 118 substation one.

I totally support that for "standard" agents. For agent using special techniques (for example graph neural nets) that might be the case (I emphasize the "might" no proof have been made in such direction yet). But for sure, a standard fully connected NN will not work on the 118 if trained on the 14.

the converter only applies the first 17 latent actions

This is where i'm lost. Can you describe a bit the process that you have in mind ?

I thought you had "something" that ouput a number, say between 1 and 1000 (for example) and you wanted to map this number to a valid action regardless of the size of the grid. Is that correct ?

If so, what is the "latent space" here ?

EloyAnguiano commented 1 year ago

I totally support that for "standard" agents. For agent using special techniques (for example graph neural nets) that might be the case (I emphasize the "might" no proof have been made in such direction yet). But for sure, a standard fully connected NN will not work on the 118 if trained on the 14.

Thats exactly what I'm trying to use here. A GNN using transformers, therefore I can mask the attention between non existing nodes each time

This is where i'm lost. Can you describe a bit the process that you have in mind ?

As I have said, the Transformer architecture should give me the MAX_NODES output (as in a LLM that returns the MAX_SEQ),. Therefore, the converter should be the object that translates that sequence to a valid one in each observation of the environment and perform each of the actions to each of the existing nodes.

EloyAnguiano commented 1 year ago

I am planning on making myself that converter, but first I need to know if this is the correct way of doing it or not. Also, I would like to know if there is any converter that maps an N dimensional vector to every action (or discretizes to N actions per node) that you could make on a node (I could use that converter in each of the relevant nodes of the observation)

BDonnot commented 1 year ago

I did not dive too deep in the "attention is all you need" paper nor in the (now super common) transformer architecture and this is probably why i'm lost.

So say you have N nodes (N = 14 for the ieee 14 and N = 118 for the ieee 118). You want to feed your neural net a k (<= N) vector. Is that correct (for the observation) ?

Then for the action part, you NN will output somehow a M dimensional "dense" vector. Is that correct ? And you ask if there exist a way to map this M dimensional dense vector to a specific grid2op action ? Is that correct or Am I totally wrong ?

If that is your questions, then no for the second part: it does not exist. For me it's similar to a trained embedding of the action space.

Also, note that I worked a bit on the "graph" representation of the grid2op environment. And the "as_networkx" method will be deprecated (renamed "as_energy_graph" - because this was the graph "seen" by the energy flowing in the powergrid). You might also be interested in this graph https://beta-grid2op.readthedocs.io/en/bd_dev/grid_graph.html#graph2-the-elements-graph for your project.

EloyAnguiano commented 1 year ago

So say you have N nodes (N = 14 for the ieee 14 and N = 118 for the ieee 118). You want to feed your neural net a k (<= N) vector. Is that correct (for the observation)?

Indeed, for the input k will always be strictly equal to N (k=N), but N will always be N <= MAX_NODES. As sometimes N will be different in size (N = 14 for the ieee 14 and N = 118 for the ieee 118), the output for a model of MAX_NODES = 200 will be relevant only for the first 14 vectors or 118 vectors respectively.

Then for the action part, you NN will output somehow a M dimensional "dense" vector. Is that correct ? And you ask if there exist a way to map this M dimensional dense vector to a specific grid2op action ? Is that correct or Am I totally wrong ?

Yes, the goal is to have a M dimensional vector for each of the element modifieble by actions in the graph. I assumed that the elements that changed with the performable actions were the nodes, but It could be some of the elements named here https://beta-grid2op.readthedocs.io/en/bd_dev/grid_graph.html#graph2-the-elements-graph

BDonnot commented 1 year ago

OK I think I understand what you want more clearly. But I'm still not there yet.

So let me explain what I thought and you can correct me where I'm wrong.

action space

You want a single action space that handle multiple grid2op actions space.

So you would like an action space that you can customize with say "n_max_load", "n_max_gen", "n_max_line" etc

And then this "action_space to bind them all" is able to automatically translate an action expressed like:

super_action_space(description of the action).to(case14.action_space)
super_action_space(description of the action).to(case1118. action_space)
etc

Question 1: And as long as you have less than "n_max_load", "n_max_gen", "n_max_line" etc the .to() method is able to make the conversion?

Question 2: Is this correct? If this is, then I assume at some point you will also want this action space to be converted to a gym.Spaces right?

Question 3: So maybe it's easier in your case to directly manipulate gym Spaces. What I have in mind, instead of trying to have multiple grid2op space and then to bind them all in one. For each action space you got, you convert them to a gym space. And then you create (or look if it already exists maybe) a "binding" of gym spaces that will transform the "global gym action" into each of the "sub gym actions". Would that work for you?

Observation space

Maybe you can convert the things to graph with any of the available methods (you have the right link for the doc of this part) and then use somehow a graph neural net to make an "encoding" / "embedding" / "latent dense representation" (whatever you want to call the process : graph - > vector) of each observation you got.

Then from this dense vector of known size (size that does not depends on the power grid size, so it would be the same for the ieee14 and the ieee118 for example) you can "create some action" by using a decoder that would again create another vector that you can use in the "solution" in question 3 for example.

EloyAnguiano commented 1 year ago

I'll answer you here:

Observation space

My actual observation space is capable of ingesting any size graph (whichever definition of graph you want to take into account, this graph assumption is key to the proposal I am making) and I am able to vectorize it. This is not a problem as I could make any graph I want with the new method you added.

Action space

At the moment, the action space is a discrete space. Indeed, starting from the latent vector of the representation of each network, I can use a translation network between this latent state and the action to be performed. However, this is not generalizable so that the same agent is able to manage different electrical networks.

This is why the idea I have in mind to solve this is not to generate that latent vector from the network, but to extract one latent state per node of the network (let's not assume any prior conception of what this means), which is possible thanks to the multi-attention architecture. Once that latent node state is extracted, it would be necessary for a network to map it to the actions achievable at a node (which will require a variable size converter, which is why I'm asking).

Question 1: And as long as you have less than "n_max_load", "n_max_gen", "n_max_line" etc the .to() method is able to make the conversion?

Yes, I don't know the implementation of the grid2op library but the environment should be able to know how many vectors to use for this action since it knows how many nodes the previous observation had.

Question 2: Is this correct? If this is, then I assume at some point you will also want this action space to be converted to a gym.Spaces right?

Exactly. My ideal gym space would be a Box((M, MAX_NODES)), being M the action per node dimension and lets assume MAX_NODES is MAX_INT (this is not technically possible, but for not mixing wrong ideas here).

Question 3: So maybe it's easier in your case to directly manipulate gym Spaces. What I have in mind, instead of trying to have multiple grid2op space and then to bind them all in one. For each action space you got, you convert them to a gym space. And then you create (or look if it already exists maybe) a "binding" of gym spaces that will transform the "global gym action" into each of the "sub gym actions". Would that work for you?

This approach might be possible but I think it might miss the power of multi-attention network processing.

Important

I think this approach assumes something that need not be true. As I said before, what means a node of this graph can be open, but I need that what it assumes at the input is the same as what is assumed as node at the output, and for this way of translating actions a node must have a property that I don't know if it will be possible to fulfill.

This assumes that it is possible to represent the network as a graph whose nodes are the only modifiable entity to work with the network (again, a node does not have to be a substation or a bus connection in a substation, but it could be a high voltage line). So far I had assumed that everything that can be done in an electrical network could be encoded as a vector (M sized) at the level of a bus connection in a substation, but perhaps this assumption is more correct at line level.

Is it possible to encode the entire network so that the nodes of the network representing the network have this characteristic?

BDonnot commented 1 year ago

Hello

Action space

However, this is not generalizable so that the same agent is able to manage different electrical networks.

I don't know and I would not be so assertive about it. If your NN knows (because you send it in input) which grid it operates I'm pretty sure this can work, at least for some NN architecture.

What would be required here would be to have at least some training data for all the grid.

At least I don't see what fundamentally makes this impossible.

extract one latent state per node of the network (let's not assume any prior conception of what this means),

Oh I see better. You want something that, from a "dense vector" outputs a valid topological action for this substation. Is this correct ?

If that is what you want, then no there is no such things in grid2op at the moment. And honestly I would not even know where to start to code such things.

Important

I think I have trouble to understand what you mean because I think there are confusions between "network" (neural net) and "network" (powergrid) and "network" (graph neural network) same for "node". I think It would be clearer to keep the words "network" for "neural network" (with weights and units), "grid" for powergrid (with lines and susbtation) and graph (with edges and nodes)

I think this approach assumes something that need not be true. As I said before, what means a node of this graph can be open, but I need that what it assumes at the input is the same as what is assumed as node at the output, and for this way of translating actions a node must have a property that I don't know if it will be possible to fulfill.

Which "node" in which graph are you talking about ? The "node" in the graph representation of the powergrid (so a substation) or a node in the "graph neural network" ?

Is it possible to encode the entire network so that the nodes of the network representing the network have this characteristic?

Same here, can you clarify and make the difference between network (of neural net) and powergrid ?

I guess you mean "is it possible to encode the entire powergrid so that the nodes of the representation of this grid as a graph have this characteristic" ?

Sorry for the confusion but I think it would make thinks clearer for me :-)

EloyAnguiano commented 1 year ago

I agree with the nomenclature, so from now on I will follow that nomenclature to redetermine the problem to see if it is clearer.

The idea I have is the following: Because the graph neural network using multi-attentional layers is able to map N vectors to N latent states (this N being a variable number between different states of the powergrid). Therefore, I need to generate a graph representing such a powergrid (you said that the nodes are the substations, but since we now have different types of graphs depending on whether it is the energy graph or not I would not venture to say as much) whose node elements are the "actionable entities of the powergrid". That is, if it were only possible to do actions on the lines, the graph I would use would be one in which the powerlines were the nodes of the graph and the edges were the substations (it is quite common to do this kind of transformations in graph theory).

Once we agree according to your experience which entities could be taken as "actionable", I will conform my graph according to those entities as N nodes in each state of the powergrid, which will be mapped to N latent vectors with the multiattenuation network, which will suppose a series of actions in each of those actionable entities according to the converter I am asking for.

EloyAnguiano commented 1 year ago

Summing up, if we assume each bus in a substation as the "Actionable Entity" (this could be not the case), now we have this: Discrete

But I am trying to get to this: Continous

Of course we should arrange the graph as the translation of a power network where the nodes are those actionable entities.

BDonnot commented 1 year ago

Hello

As often, "picture says a thousand words". It's much clearer what you want to achieve.

Just some precisions, as I'm still not very clear. Will a "node" of your graph be a substation (eg you want the action to be action on the substation)

Or if a "node" of your graph will be an unary element of the grid: for example you could specify for each load, generator, storage units, shunts, origin side of powerline, extremity side of powerline.

If a "node" of your graph is a substation, then I think the only way to have what you want is to use a trainable function (eg a neural network) for the action converter. Indeed, in this case the number of actions you can do per substation varies A LOT between substations. For ieee 118 for example you got from 0 actions (ie you can't change anything) to ~ 2^15 (32k) actions at a particular substation.

If a "node" of your graph is an unary element of the grid, then the "node action converter" can be pretty straightforward, as for each node you would get only 4 valid choices:

-1 to disconnect
0 to not change
1 to connect on busbar 1
2 to connect on busbar 2

In this case you could use anything to convert your latent actions to a 4 dimensional output and then take the output with the highest score (as it is often done in classification for example).

BDonnot commented 9 months ago

Hi,

Is this issue closed or something still needs to be done ?