eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms
MIT License
577 stars 152 forks source link

How to analyze the data after being successfully trained. #38

Closed zhaoworking closed 4 years ago

zhaoworking commented 4 years ago

I'm sorry to bother you again, but I can't find the way to analyze the traning data . I wanna watch the results of the DQNAgent method ,and how good it is.

[rl_agents.trainer.evaluation:INFO] Saved DQNAgent model to /home/zhao/gym/rl-agents-master/scripts/out/HighwayEnv/DQNAgent/run_20200406-223934_3509/checkpoint-999.tar [rl_agents.trainer.evaluation:INFO] Saved DQNAgent model to /home/zhao/gym/rl-agents-master/scripts/out/HighwayEnv/DQNAgent/run_20200406-223934_3509/checkpoint-final.tar

So ,you can see it has been completely traind . Then , I use the command python3 analyze.py run out/HighwayEnv/DQNAgent/run_20200406-223934_3509 to analyze the trained model. But I get this error NotADirectoryError: [Errno 20] Not a directory: out/HighwayEnv/DQNAgent/run_20200406-223934_3509/openaigym.video.0.3509.video000000.meta.json Can you tell me how to analyze the trained model , and in which way can I see the results of the trained model ?

zhaoworking commented 4 years ago

Have looked other's Issue asked before ,and I have finished the test with trained model. But I have a qusstion here , when I use the command python3 experiments.py evaluate configs/HighwayEnv/env_easy.jsonconfigs/HighwayEnv/agents/DQNAgent/baseline.json --test --episodes=10 Only to find this warning [WARNING] No pre-trained model has been loaded.And the generated video ,showing the test resluts, is very bad , with the host car consistently crashing into other cars,and the socre low. But when I use the command python3 experiments.py evaluate configs/HighwayEnv/env_easy.json configs/HighwayEnv/agents/DQNAgent/baseline.json --test --episodes=10 --recover-from=out/HighwayEnv/DQNAgent/saved_models/latest.tar , It works well. So, does that it can't find the latest.tar lead to the bad performance ? And , within the video , I have some confusion ,hope you can help me .

  1. There are five velocity vand position p symbols in the bottom of the video, and what does the each symbol denote?
  2. I have found there are several checkpoints in the runfold, including chekpoint-best and checkpoint-fina,checkpoint-nums. Why is the chekpoint-nums unconsistent and what's diference among these chek-points?
eleurent commented 4 years ago

But I get this error NotADirectoryError: [Errno 20] Not a directory: out/HighwayEnv/DQNAgent/run_20200406-223934_3509/openaigym.video.0.3509.video000000.meta.json

I'll fix that. In the meantime, you have to enter the agent directory: out/HighwayEnv/DQNAgent.

But I rather recommend that you use tensorboard for visualization. Install it, and then simply run tensorboard --logdir out/HighwayEnv/DQNAgent. Opening a browser at the printed location will provide you with an interface to see different plots and compare several runs.

Only to find this warning [WARNING] No pre-trained model has been loaded.And the generated video ,showing the test resluts, is very bad , with the host car consistently crashing into other cars,and the socre low.

Yes, apparently the latest.tar could not be found, so you are testing an agent with random initialisation. Could that be because a space is missing between the env and agent configs in your command? Otherwise it is expected to work, since your second command shows that the latest.tar model exists in the proper location.

There are five velocity v and position p symbols in the bottom of the video, and what does the each symbol denote?

The five cells represent the five actions: left lane, idle, right lane, faster and slower. The v stands for value, and shows the predicted value Q(s,a) of the action a in current state s. And the p stands for probability that the policy selects this action. At test time, if you use epsilon-greedy exploration then the epsilon is set to 0, so the action probabilites should be a deterministic argmax (best action at 1 and others at 0).

I have found there are several checkpoints in the run fold, including chekpoint-best and checkpoint-fina,checkpoint-nums. Why is the chekpoint-nums unconsistent and what's diference among these chek-points?

By default, checkpoints are taken under a cubic progression first (1,8,27,etc.) and then linear after reaching episode 1000 (2000,3000,etc). These checkpoints correspond to the trained network at different times of the training process.

zhaoworking commented 4 years ago

Feel sorry to bother you again, but I can't fix the problem with tensorboard. When I use the command tensorboard --logdir out/HighwayEnv/DQNAgent, I get this problem

Traceback (most recent call last):
  File "/root/anaconda3/bin/tensorboard", line 11, in <module>
    sys.exit(run_main())
  File "/root/anaconda3/lib/python3.6/site-packages/tensorboard/main.py", line 59, in run_main
    program.get_default_assets_zip_provider())
  File "/root/anaconda3/lib/python3.6/site-packages/tensorboard/program.py", line 144, in __init__
    self.plugin_loaders = [make_loader(p) for p in plugins]
  File "/root/anaconda3/lib/python3.6/site-packages/tensorboard/program.py", line 144, in <listcomp>
    self.plugin_loaders = [make_loader(p) for p in plugins]
  File "/root/anaconda3/lib/python3.6/site-packages/tensorboard/program.py", line 143, in make_loader
raise ValueError("Not a TBLoader or TBPlugin subclass: %s" % plugin)

ValueError: Not a TBLoader or TBPlugin subclass: <class 'tensorboard_plugin_wit.wit_plugin_loader.WhatIfToolPluginLoader'>

I have sought the answers on forum and Internet , but there is no one that has the similar problem as me , can you help me.

zhaoworking commented 4 years ago

I uninstall the tensorboard , and fix the problem above . But I get another problem ,No scalar data was found. It seems like tensorboard can't find the data in the file events.out.tfevents.1587437539.And my terminal represent this tip information W0426 15:43:27.497121 140414878865152 core_plugin.py:172] Unable to get first event timestamp for run myfolder name If any, can you help me with the issue.

eleurent commented 4 years ago

Can you attach the tfevents file from one of your runs?

zhaoworking commented 4 years ago

The tfeents file is well , and I guess the reason is that the tenserboard or tensorflow of my linux server can't work , normally .Becasue it works well on another linux server. Thanks for your attention.

zhaoworking commented 4 years ago

Can I ask some quesion about the parameter in tensorboard, cause I don't understand the data in some sections.

  1. The agent trainable_parameters is so large , and what does it denote ?
  2. Is the length of the episode the accummlative states numberswhich starts counting at beginning state and ends at the terminal state? And the length in tensorborad is twenty , which implies the states of the specific episode are twenty?
  3. Moreover , tere are two items ,episode/return and episode/total reward, and the values are four and sixteen ,respectively. What's the relationship of them ? I think that the episode/return multiply the length of the episode equals to episode/total reward . But ,apparently , it is not.
  4. At last , I find there is no data in section exploration , and I can only see the exploration policy is epsilon.

I will be very appreciated with you , if you can help me figure out these confusion.

eleurent commented 4 years ago

Of course,

  1. this is the "size" of your model, i.e. the number of weights of the neural network.
  2. yes, the number of actions/steps until reaching a terminal state. You are probably using an environment where each episode has a fixed duration of 20 (it is configurable).
  3. total reward is the sum of rewards at all steps of the episode: sum_{t=1}^T r_t. In contrast, the return is the discounted sum of rewards, where the reward at time t is discounted by gamma^t, and gamma [0, 1] is the discount factor: sum_{t=1}^T gamma^t r_t. It is often the return which is optimised by RL algorithms, rather than the total reward.
  4. There should be the value of epsilon in this section. You may need to click on the section name to make it appear?
zhaoworking commented 4 years ago

oh, I see .Thank you . By the way ,can you explain the meaning of episode/rewards in DISTRIBUTIONand HISTOGRAMS? Why the number is lower than one at each of the episodes and what does it mean ?

eleurent commented 4 years ago

The reward is chosen normalized in [0, 1]. The distribution/histogram show the frequencies of actual reward values (within the [0-1] interval) obtained within an episode, and the evolution of these frequencies with respect to time. This is in contrast to the total reward/ return plots in the scalar view, where rewards are summed, leading to total values higher than 1.

zhaoworking commented 4 years ago

The reward is chosen normalized in [0, 1]. The distribution/histogram show the frequencies of actual reward values (within the [0-1] interval) obtained within an episode, and the evolution of these frequencies with respect to time.

the frequencies of actual reward values (within the [0-1] interval) obtained within an episode

I don't understand what it means , cause I see the horizontal axis is the steps and the vertical axis is the probability (If it is ,I don't know.). What I mean is I don't see any reward values in this distribution , I think there should be a hidden mapping of the reward values and its frequencies ?

zhaoworking commented 4 years ago

Thanks again for your patient and considerated reply.I will closed it right now.

eleurent commented 4 years ago

Hi @zhaoworking, I'm sorry I missed your previous post. If you still have questions, please feel free to ask!