Closed jvmncs closed 5 years ago
Do you have a command to reproduce the issue? Running
python main.py -dc -L test-log/corners -C -E 1000 -V 100 -EE 50 corners ppo-cnn -r 10 -e 10 -l .01
produces the expected result for me
(using tensorboard 1.11.0 and Firefox 64.0).
Regarding, (4) I believe A is the action the agent takes and S is the timestep. However I agree to make this more clear or remove the numbers completely.
Every command I've run with the toy gridworlds since #17 has produced similar gifs. I've mainly been running TensorBoard over a VPN, either on GCP or a private one. Ran this on Chrome Version 71.0.3578.98. Commit is cb55158dc4b63f74385cdd06ad6f5f73440d95c8. Example command:
python main.py -L runs/way/baseline/ppo-cnn/blah -E 3000 -V 160 -EE 300 way ppo-cnn -l .001 -r 200 -e 5 -eb .1
Very interesting, I get the same bug and I'm quite surprised David doesn't.
As David said, A is the action taken and S is the timestep. Would making these smaller but fully spelled out help?
So it seem like this problem was introduced by tensorboardX 1.6. I was still using version 1.5, which is why I could not reproduce the bug.
tensorboardX changed the API for videos in the latest version, which means we have to change one line in safe-grid-agents
in order to accord for that. Unfortunately, when making this change we get other artifacts, which seem to be due to a bug in tensorboardX. I submitted a PR adressing this https://github.com/lanpa/tensorboardX/pull/338 However, for now it seems like we will have to go with tensorboardX <= 1.5, see https://github.com/jvmancuso/safe-grid-agents/pull/49
Would making these smaller but fully spelled out help?
I think that would help. But maybe it would be even better to spell out the actions, i.e LEFT, RIGHT, ... Then we would maybe not have to write "Action" in front of it. For the timestep, maybe "t = 1", "t = 2" etc? Or spell it out.
I think we should spell out actions but not timesteps, so something like {action} at t = {timestep}
Calling S the step is confusing to me since I automatically associate it with the state.
There seems to be undesirable behavior in the way the toy gridworlds are rendering in Tensorboard. Since the AI Safety Gridworlds rendering were functioning normally before, I'm assuming this is something to do with code over here. See figure below for example rendering.
Specifically, I feel like the following is unwanted behavior: (1) multiple agents in each frame (maybe this is a general safe-grid-gym thing, or something to do with a parameter in safe-grid-agents?) (2) overlapping rendered characters (e.g. the A, S values at the bottom), but also seems related to (1) (3) only rendering partial trajectories (ideally we'd see the agent go from the initial state to the end state for each trajectory) (4) what do A and S represent? they seem ambiguous, so as a user I don't know how to interpret those numbers