Closed DennisGross closed 2 years ago
Naive solution: Name PRISM state variables in such a way that they got automatically placed in the OpenAI Gym order. Use the disable_state_variable flag to disable all the helper state variables from PRISM.
One Solution: Use a JSON file, which maps the prism state variables to a specific index in the state vector. This state vector is then exactly the same as the return state vector of open ai gym. Given: OpenAI Gym with (x,y) PRISM (y,x,done) Mapper: {"frozen lake" : {"x":0,"y":1,"done":-1} (-1 means disabled) After the mapping, both state vectors have the same format (x,y)
If you disable it, how do you select which action to take?
"done" is a helper state variable in the PRISM environment. it is not an action. Or what do you mean exactly?
In general, if you disable a state, there are multiple different states that are mapped to the same state. Whose action are you gonna pick?
The disable-functionality only happens from the RL agent point of view. It makes its decision based on the received state variables.
Notes: Given: (OpenAI Gym, PRISM Gym) Goal: The neural network should get the OpenAI Gym and PRISM Gym environment states always in the same format. Create a mapping file in the prism file folder. The mapping is always oriented at the OpenAI Gym state. We have to format the PRISM Gym state in the right order. OpenAI -> PRISM 0 -> 1 1 -> 0 2 -> 4 3 -> 2 4 -> 3 Disable extra state variables via disabled_features
Note: Action mapping is no problem at all.
Do we only support boolean/integer state variables? - I would say so (at least for the beginning).
Do we only support boolean/integer state variables? - I would say so (at least for the beginning).
Yes.
First Solution: Goal: Map state variable order from open ai gym to prism. Steps:
The state variables can be different in OpenAI Gym environments and PRISM environments. For example: OpenAI Gym Frozen Lake State: x=3,y=0 PRISM Frozen Lake State: done=false,y=0,x=3
Therefore, if we train an RL agent in OpenAI Gym and afterwards in PRISM, we have to automatically order the state variables and deal with additional state variables (OpenAI Gym -> PRISM).
Therefore, if we train an RL agent in a PRISM environment and afterwards in OpenAI environment, we have to automatically remove the additional state variables and order them (PRISM -> OpenAI Gym).