Closed pkrack closed 2 hours ago
Thanks for the issue again, I'll have a look. At least initially, your solution seems to work but I'm uncertain about bits of it
In the message of this commit, there is some reasoning about which action space should be used where. Maybe that helps identifying issues.
I think the current implementation was adapted from the VectorizeObservation
wrapper. But TransformObservation
and TransformAction
work in an opposite direction. TransformObservation
wants a func
that transforms an inner obs to an outer obs, TransformAction
wants a func
that transforms an outer action to an inner action. Hence the reversal of self.env.x_space
and self.x_space
in the VectorizeObservation
and VectorizeAction
wrappers.
Yes, in short, the problem was that when I was implementing it I was thinking about the data "moving" in the same direction of observation rather than the opposite.
I'm adding some more testing but yes, I think the primary issue was the out
was using self.single_action_space
not self.env.single_action_space
as you corrected.
Also looking at the actions
function, the two cases are identical except for the concatenate
output which should be actions
and out
respectively.
Currently, I'm adding more testing for this to doubly check this.
Thanks again @pkrack
Describe the bug
The
VectorizeAction
wrapper is not correctly implemented.The problem lies in how
self.out
is initialized, howiterate
is called and howconcatenate
is called. There is a confusion between what should beself.action_space
,self.single_action_space
,self.env.action_space
andself.env.single_action_space
. Cf. example/test below.I did not submit a PR because the contributing guidelines mention that you have to run the test suite, but the jax functional test seem to run forever on my machine. PR is ready though, let me know if you want me to submit it.
Proposed solution:
Code example
System info
Additional context
No response
Checklist