Open darsnack opened 5 years ago
Hi @darsnack ! If we have a discrete space then the environment is not differentiable. Because in discrete space, we extract the index and pass it to step!
. Mapping {1, 2} --> {-1, 1}
is just a hack we found for CartPole's action space, to turn it into a continuous one. But in long term, we would want to be able to use a Discrete space still keep it differentiable, or a hack to map {1, ..., n} --> some continuous space
would also be helpful.
I think logically, a discrete to continuous mapping would be {1, ..., n} --> [1.0, n]
. Beyond that, I think it is unique to each environment. For example, in CartPole, we would have the standard mapping {1, 2} --> [1.0, 2.0]
, then CartPole would calculate force = 2f0 * (continuous_action - 1f0) - 1f0
. Is this along the lines you are thinking?
Yeah right, it is dependent on environment. Ideally, I would like to keep an environment's discrete action space as it is and introduce a black box between model and step!
that would take the gradient and pass it through the index from where the action value came.
The hack which you provided should also work. Continuous action space runs from -inf to inf. Negative and positive values are equally likely. Because of this it is suitable for Discrete space of size 2 to map to it. By mapping {1, 2} --> [1.0, 2.0]
, I assume we would shift origin to 1.5 such that anything below it is rounded to 1 and above it to 2.
Been thinking about this recently. Should we establish an experimental zygote
branch that uses custom adjoints to implement differentiable DiscreteSpaces?
Currently, the DiscreteSpace is defined as
{1, ..., n}
(as it should be), but the lines inCartPole.jl
that map{1, 2} --> {-1, 1}
are commented out. Additionally, the assertion is commented out. Is there a reason for this? Someone has already written the code to transfer thestep!
logic to a{1, ..., n}
action space, so why aren't we using it?If there is a reason, can we settle what the standard action space should be?