cselab / smarties

Lightweight and scalable framework for Reinforcement Learning
MIT License
117 stars 49 forks source link

question regarding to the settings and evaluating the training #4

Open waynezw0618 opened 4 years ago

waynezw0618 commented 4 years ago

Hi Novatig: thanks for replying. the updated readme is helpful. Simultaneously I have read your paper of glider, seems the controling results are sensitive to the training methods and the setting. now I implement the a boat navigation case based on the boatNav. everything works fine with runTest for me. then I use RACE to train, the case is to control the boat to turning some angle and move to another location without limitation the final nose angle and termination speed in the reward. but I get quite little success sample during training. most of the time the boat got lost, does that influence the training? would you please make some suggestion? RACE? DQN? PPO? or change some setting? furthermore, after training, I want to evaluate solution by restart, I use the command "boatNav --restart /public3/home/sc51719/soft/smarties/apps/boadNav/evalue RACE.json " but I got many trajectories. which i suppose to be one trajectory with successful reach the point i specified, since I suppose restart is going to load the trained network to evaluate the performance of one task. but there are even results of the boat got lost. what is meaning of that? sometimes there is event not any reach the harbor can you provide me some suggestion?

novatig commented 4 years ago

Hi,

I am not familiar with the boatNav environment, it was not implemented by me. If you want I could ask the developer to update the source files. Otherwise, I should just take them out of the repo.

For the rest, I don't quite get your questions. Algorithms-wise I typically use VRACER as default, just because it combines experience replay and policy gradients. The readme contains some default values for the hyper-parameters which depend on problem size.

The option you should use how many evaluation sims to run is smarties.py [...] --nEvalEpisodes 1.

If the behavior is not as expected: does smarties say that restart was successful? What do you mean by lost?

waynezw0618 commented 4 years ago

The option you should use how many evaluation sims to run is smarties.py [...] --nEvalEpisodes 1.

I don't use smarties.py instead I run directly with executable as "boatNav --restart /public3/home/sc51719/soft/smarties/apps/boadNav/evalue RACE.json --nEvalEpisodes 1" . the same thing?

seem it restart properly here is the screen output in the restart

==========================================================================
              Continuous-action V-RACER with Gaussian policy
==========================================================================
Experience Replay storage: First In First Out.
Experience Replay sampling algorithm: uniform probability.
    Single net with outputs: [0] : V(s),
                             [1 3] : policy mean and stdev,
    Size per entry = [1 2 2].
Initializing net approximator.
Layers composition:
(0) Input Layer of size:6
(1) SoftSign InnerProduct Layer of size:128 linked to Layer:0 of size:6
(2) SoftSign InnerProduct Layer of size:128 linked to Layer:1 of size:128
(3) Parametric Residual Connection of size:128
(4) Linear output InnerProduct Layer of size:3 linked to Layer:3 of size:128
(5) Parameter Layer of size:2. Initialized: -2.064742 -2.064742
Optimizer: Parameter updates using Adam SGD algorithm.
Restarting from saved policy...
Restarting from file /public3/home/sc51719/soft/smarties/apps/waterjet_yi_12/evalue/agent_00_net_weights.raw.
Restarting from file /public3/home/sc51719/soft/smarties/apps/waterjet_yi_12/evalue/agent_00_net_tgt_weights.raw.
Restarting from file /public3/home/sc51719/soft/smarties/apps/waterjet_yi_12/evalue/agent_00_net_1stMom.raw.
Restarting from file /public3/home/sc51719/soft/smarties/apps/waterjet_yi_12/evalue/agent_00_net_2ndMom.raw.
Restarting from saved policy...
Restarting from file /public3/home/sc51719/soft/smarties/apps/waterjet_yi_12/evalue/agent_00_scaling.raw.
Evaluating the policy: will skip restarting the Replay Buffer from file.
^MCollected 0 environment episodes out of 512  to evaluate restarted policies.^MCollected 26 environment episodes out of 512  to evaluate restarted policies.^MCollected 52 environment episodes out of 512  to evaluate restarted policies.^MCollected 77 environment episodes out of 512  to evaluate restarted policies.^MCollected 103 environment episodes out of 512  to evaluate restarted policies.^MCollected 128 environment episodes out of 512  to evaluate restarted policies.^MCollected 154 environment episodes out of 512  to evaluate restarted policies.^MCollected 180 environment episodes out of 512  to evaluate restarted policies.^MCollected 205 environment episodes out of 512  to evaluate restarted policies.^MCollected 231 environment episodes out of 512  to evaluate restarted policies.^MCollected 256 environment episodes out of 512  to evaluate restarted policies.^MCollected 282 environment episodes out of 512  to evaluate restarted policies.^MCollected 308 environment episodes out of 512  to evaluate restarted policies.^MCollected 333 environment episodes out of 512  to evaluate restarted policies.^MCollected 359 environment episodes out of 512  to evaluate restarted policies.^MCollected 384 environment episodes out of 512  to evaluate restarted policies.^MCollected 410 environment episodes out of 512  to evaluate restarted policies.^MCollected 436 environment episodes out of 512  to evaluate restarted policies.^MCollected 461 environment episodes out of 512  to evaluate restarted policies.^MCollected 487 environment episodes out of 512  to evaluate restarted policies.^MFinished collecting 512 environment episodes (option --nEvalEpisodes) to evaluate restarted policies.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
Rank 0 _sendState(/public3/home/sc51719/soft/smarties/source/Communicator.cpp:260)  App recvd end-of-training signal but did not abort on it's own.
"tmp" 155L, 16604C

for boatNav case, the target is to move the boat from the start to the end by adjusting the torque and thrust, there are two situations, either arrived or not. Although there are several successful case(the boat arrives at the target) during training, d I get N Evaluation solutions but non of them arrives at the terminal in the evaluation.

any suggestion

novatig commented 4 years ago

Sorry for the wait.

Again, I am afraid you should look into the boatNav case yourself because the person who developed it has left the lab. On a recent commit I replaced the default hyperparams I used for the refer paper with hyperparams that change with problem size, that might be a better initial guess.

As far as running with ./boatNav instead with the smarties.py script. Yes the settings shoul be the same. You can check with ./boatNav --help Hoever you cannot specify the settings .json file from the executable. smarties will directly use the file it finds in the run directory. Try removing RACE.json (why not the provided RACER.json? or the default VRACER.json?) and see if the args are parsed correctly.

waynezw0618 commented 4 years ago

I rewrite the environment part and tested. I actually don't have any setting.json like VRACER.json in my run folder. seems it reads from the somewhere I don't know. whether we can open a private repo to discuss this ?