DennisGross / COOL-MC

The interface between Model Checking and Reinforcement Learning.
13 stars 1 forks source link

State Space Translater #6

Closed DennisGross closed 2 years ago

DennisGross commented 2 years ago

The state variables can be different in OpenAI Gym environments and PRISM environments. For example: OpenAI Gym Frozen Lake State: x=3,y=0 PRISM Frozen Lake State: done=false,y=0,x=3

Therefore, if we train an RL agent in OpenAI Gym and afterwards in PRISM, we have to automatically order the state variables and deal with additional state variables (OpenAI Gym -> PRISM).

Therefore, if we train an RL agent in a PRISM environment and afterwards in OpenAI environment, we have to automatically remove the additional state variables and order them (PRISM -> OpenAI Gym).

DennisGross commented 2 years ago

Naive solution: Name PRISM state variables in such a way that they got automatically placed in the OpenAI Gym order. Use the disable_state_variable flag to disable all the helper state variables from PRISM.

DennisGross commented 2 years ago

One Solution: Use a JSON file, which maps the prism state variables to a specific index in the state vector. This state vector is then exactly the same as the return state vector of open ai gym. Given: OpenAI Gym with (x,y) PRISM (y,x,done) Mapper: {"frozen lake" : {"x":0,"y":1,"done":-1} (-1 means disabled) After the mapping, both state vectors have the same format (x,y)

sjunges commented 2 years ago

If you disable it, how do you select which action to take?

DennisGross commented 2 years ago

"done" is a helper state variable in the PRISM environment. it is not an action. Or what do you mean exactly?

sjunges commented 2 years ago

In general, if you disable a state, there are multiple different states that are mapped to the same state. Whose action are you gonna pick?

DennisGross commented 2 years ago

The disable-functionality only happens from the RL agent point of view. It makes its decision based on the received state variables.

DennisGross commented 2 years ago

Notes: Given: (OpenAI Gym, PRISM Gym) Goal: The neural network should get the OpenAI Gym and PRISM Gym environment states always in the same format. Create a mapping file in the prism file folder. The mapping is always oriented at the OpenAI Gym state. We have to format the PRISM Gym state in the right order. OpenAI -> PRISM 0 -> 1 1 -> 0 2 -> 4 3 -> 2 4 -> 3 Disable extra state variables via disabled_features

DennisGross commented 2 years ago

Note: Action mapping is no problem at all.

DennisGross commented 2 years ago

Do we only support boolean/integer state variables? - I would say so (at least for the beginning).

sjunges commented 2 years ago

Do we only support boolean/integer state variables? - I would say so (at least for the beginning).

Yes.

DennisGross commented 2 years ago

First Solution: Goal: Map state variable order from open ai gym to prism. Steps:

  1. Create for abc.prism a json file called abc.json
  2. {"prism_state_variable_name": OPENAI_INDEX, ...}
  3. While parsing the PRISM state, we use the state transformer to bring the state variables in the same order as the OpenAI Gym state.