chiamp / muzero-cartpole

Applying DeepMind's MuZero algorithm to the cart pole environment in gym
20 stars 1 forks source link

continous action space #3

Open maazashraf23 opened 5 months ago

maazashraf23 commented 5 months ago

Hey Chiamp, how can i use this code to balance the continuous Single inverted pendulum environment with continuous action space? can you guide me little bit thanks.

chiamp commented 4 months ago

I haven't worked with continuous action spaces before, but seems like there's a paper that tackles this problem (written by the authors of the original MuZero paper). I haven't read the paper myself yet.

I think you could also try quantizing the continuous action space into discrete actions, for example, converting a continuous action space of [-3, 3] to a discrete action space of n actions. If n=3, then your action space will be -3, 0, 3 If n=5, then your action space will be -3, -1.5, 0, 1.5, 3 etc.