aravindr93 / mjrl

Reinforcement learning algorithms for MuJoCo tasks
Apache License 2.0
356 stars 101 forks source link

Much worse learning performance with new code base #17

Closed Jendker closed 4 years ago

Jendker commented 5 years ago

I am working based on the code from repo hand_dapg. First of all thank you very much for open sourcing this great work!

Because I need to work with mujoco-py 2.0 I have switched to the new code base 'redesign' for both mjrl and mj_envs. The problem is, that right now the dapg from dapg example https://github.com/aravindr93/hand_dapg/tree/master/dapg/examples does not learn anymore, please have a look on training results.

For the code from master I get the following results for iteration 78:

----------------  ------------
VF_error_after       0.108092
VF_error_before      0.129821
alpha              208.393
delta                0.1
eval_score        2932.27
kl_dist              0.0504399
running_score     1543.33
stoc_pol_max      4359.31
stoc_pol_mean     2119.18
stoc_pol_min       -11.1661
stoc_pol_std      1347.49
success_rate        79
surr_improvement     0.045797
time_VF              3.60636
time_npg             2.59354
time_sampling       12.1133
time_vpg             0.136289
----------------  ------------

and with redesign (here I switched from behavior_cloning_2 to behavior_cloning, because the former was removed from repo):

----------------  -------------
VF_error_after        0.385029
VF_error_before       0.671899
alpha               383.606
delta                 0.1
eval_score          366.002
kl_dist               0.0500591
num_samples       40000
running_score        23.3569
stoc_pol_max        667.282
stoc_pol_mean        29.4223
stoc_pol_min        -15.7963
stoc_pol_std         87.9433
success_rate          2.5
surr_improvement      0.0288742
time_VF               3.47379
time_npg              2.64448
time_sampling         8.92309
time_vpg              0.137907
----------------  -------------

This difference in performance was confirmed by the next runs of the both code versions.

Is there anything what should be adjusted in the dapg example to achieve the same learning performance as before?

Jendker commented 5 years ago

I have checked an older commit: 762ab5b3d1231006 and for this one the learning still looks good:

----------------  -------------
VF_error_after        0.0467942
VF_error_before       0.0591493
alpha               205.699
delta                 0.1
eval_score         3200.84
kl_dist               0.0498372
num_samples       40000
running_score      2240.57
stoc_pol_max       4620.1
stoc_pol_mean      3072.97
stoc_pol_min         -2.67104
stoc_pol_std       1099.73
success_rate         93.5
surr_improvement      0.0451574
time_VF               3.57259
time_npg              2.55781
time_sampling        12.5735
time_vpg              0.13578
----------------  -------------

Maybe this will be somehow helpful.

aravindr93 commented 5 years ago

@Jendker Thanks for pointing this out, it is very helpful. I am currently working on merging redesign with master, and will include the hand manipulation results as part of the test before merging. I am expecting the merge to be complete by Monday, so I will give a more comprehensive answer on Monday.

Jendker commented 4 years ago

I got down to it and it turns out, that the changes in commit cefc221cc6 (changes in behavior cloning) broke the performance. Now I was able to recover the learning performance for the last commit of 'redesign' branch, I had to only use 'behavior_cloning_2.py' from commit cefc221cc6.

So the solution would be to restore 'behavior_cloning_2.py' from commit cefc221cc6 or maybe even replace current 'behavior_cloning.py' with it, because current 'behavior_cloning.py' does not perform good at all with DAPG.

Note: all the test results are acquired by running examplary 'dapg.py' from https://github.com/aravindr93/hand_dapg

Jendker commented 4 years ago

Thanks for the merge to master and updating hand_dapg! Now it is clear, that it is important to set argument set_transforms=True for BC with the new codebase. Issue solved.