david-lindner / safe-grid-gym

A gym interface for AI safety gridworlds created in pycolab.
Apache License 2.0
17 stars 9 forks source link

Toy Gridworlds Rendering Feedback #18

Closed jvmncs closed 5 years ago

jvmncs commented 5 years ago

There seems to be undesirable behavior in the way the toy gridworlds are rendering in Tensorboard. Since the AI Safety Gridworlds rendering were functioning normally before, I'm assuming this is something to do with code over here. See figure below for example rendering.

Specifically, I feel like the following is unwanted behavior: (1) multiple agents in each frame (maybe this is a general safe-grid-gym thing, or something to do with a parameter in safe-grid-agents?) (2) overlapping rendered characters (e.g. the A, S values at the bottom), but also seems related to (1) (3) only rendering partial trajectories (ideally we'd see the agent go from the initial state to the end state for each trajectory) (4) what do A and S represent? they seem ambiguous, so as a user I don't know how to interpret those numbers

render corner

david-lindner commented 5 years ago

Do you have a command to reproduce the issue? Running

python main.py -dc -L test-log/corners -C -E 1000 -V 100 -EE 50 corners ppo-cnn -r 10 -e 10 -l .01

produces the expected result for me

individualimage

(using tensorboard 1.11.0 and Firefox 64.0).

Regarding, (4) I believe A is the action the agent takes and S is the timestep. However I agree to make this more clear or remove the numbers completely.

jvmncs commented 5 years ago

Every command I've run with the toy gridworlds since #17 has produced similar gifs. I've mainly been running TensorBoard over a VPN, either on GCP or a private one. Ran this on Chrome Version 71.0.3578.98. Commit is cb55158dc4b63f74385cdd06ad6f5f73440d95c8. Example command:

python main.py -L runs/way/baseline/ppo-cnn/blah -E 3000 -V 160 -EE 300 way ppo-cnn -l .001 -r 200 -e 5 -eb .1
timorl commented 5 years ago

Very interesting, I get the same bug and I'm quite surprised David doesn't.

As David said, A is the action taken and S is the timestep. Would making these smaller but fully spelled out help?

david-lindner commented 5 years ago

So it seem like this problem was introduced by tensorboardX 1.6. I was still using version 1.5, which is why I could not reproduce the bug.

tensorboardX changed the API for videos in the latest version, which means we have to change one line in safe-grid-agents in order to accord for that. Unfortunately, when making this change we get other artifacts, which seem to be due to a bug in tensorboardX. I submitted a PR adressing this https://github.com/lanpa/tensorboardX/pull/338 However, for now it seems like we will have to go with tensorboardX <= 1.5, see https://github.com/jvmancuso/safe-grid-agents/pull/49

Would making these smaller but fully spelled out help?

I think that would help. But maybe it would be even better to spell out the actions, i.e LEFT, RIGHT, ... Then we would maybe not have to write "Action" in front of it. For the timestep, maybe "t = 1", "t = 2" etc? Or spell it out.

alok commented 5 years ago

I think we should spell out actions but not timesteps, so something like {action} at t = {timestep}

alok commented 5 years ago

Calling S the step is confusing to me since I automatically associate it with the state.