Standard action space for DiscreteSpace

FluxML / Gym.jl

Gym environments in Julia

MIT License

54 stars 19 forks source link

Standard action space for DiscreteSpace #23

Open darsnack opened 5 years ago

darsnack commented 5 years ago

Currently, the DiscreteSpace is defined as {1, ..., n} (as it should be), but the lines in CartPole.jl that map {1, 2} --> {-1, 1} are commented out. Additionally, the assertion is commented out. Is there a reason for this? Someone has already written the code to transfer the step! logic to a {1, ..., n} action space, so why aren't we using it?

If there is a reason, can we settle what the standard action space should be?

tejank10 commented 5 years ago

Hi @darsnack ! If we have a discrete space then the environment is not differentiable. Because in discrete space, we extract the index and pass it to step!. Mapping {1, 2} --> {-1, 1} is just a hack we found for CartPole's action space, to turn it into a continuous one. But in long term, we would want to be able to use a Discrete space still keep it differentiable, or a hack to map {1, ..., n} --> some continuous space would also be helpful.

darsnack commented 5 years ago

I think logically, a discrete to continuous mapping would be {1, ..., n} --> [1.0, n]. Beyond that, I think it is unique to each environment. For example, in CartPole, we would have the standard mapping {1, 2} --> [1.0, 2.0], then CartPole would calculate force = 2f0 * (continuous_action - 1f0) - 1f0. Is this along the lines you are thinking?

tejank10 commented 5 years ago

Yeah right, it is dependent on environment. Ideally, I would like to keep an environment's discrete action space as it is and introduce a black box between model and step! that would take the gradient and pass it through the index from where the action value came. The hack which you provided should also work. Continuous action space runs from -inf to inf. Negative and positive values are equally likely. Because of this it is suitable for Discrete space of size 2 to map to it. By mapping {1, 2} --> [1.0, 2.0], I assume we would shift origin to 1.5 such that anything below it is rounded to 1 and above it to 2.

darsnack commented 5 years ago

Been thinking about this recently. Should we establish an experimental zygote branch that uses custom adjoints to implement differentiable DiscreteSpaces?