Closed Jendker closed 4 years ago
I have checked an older commit: 762ab5b3d1231006 and for this one the learning still looks good:
---------------- -------------
VF_error_after 0.0467942
VF_error_before 0.0591493
alpha 205.699
delta 0.1
eval_score 3200.84
kl_dist 0.0498372
num_samples 40000
running_score 2240.57
stoc_pol_max 4620.1
stoc_pol_mean 3072.97
stoc_pol_min -2.67104
stoc_pol_std 1099.73
success_rate 93.5
surr_improvement 0.0451574
time_VF 3.57259
time_npg 2.55781
time_sampling 12.5735
time_vpg 0.13578
---------------- -------------
Maybe this will be somehow helpful.
@Jendker Thanks for pointing this out, it is very helpful. I am currently working on merging redesign with master, and will include the hand manipulation results as part of the test before merging. I am expecting the merge to be complete by Monday, so I will give a more comprehensive answer on Monday.
I got down to it and it turns out, that the changes in commit cefc221cc6 (changes in behavior cloning) broke the performance. Now I was able to recover the learning performance for the last commit of 'redesign' branch, I had to only use 'behavior_cloning_2.py' from commit cefc221cc6.
So the solution would be to restore 'behavior_cloning_2.py' from commit cefc221cc6 or maybe even replace current 'behavior_cloning.py' with it, because current 'behavior_cloning.py' does not perform good at all with DAPG.
Note: all the test results are acquired by running examplary 'dapg.py' from https://github.com/aravindr93/hand_dapg
Thanks for the merge to master and updating hand_dapg! Now it is clear, that it is important to set argument set_transforms=True
for BC with the new codebase. Issue solved.
I am working based on the code from repo hand_dapg. First of all thank you very much for open sourcing this great work!
Because I need to work with mujoco-py 2.0 I have switched to the new code base 'redesign' for both mjrl and mj_envs. The problem is, that right now the dapg from dapg example https://github.com/aravindr93/hand_dapg/tree/master/dapg/examples does not learn anymore, please have a look on training results.
For the code from master I get the following results for iteration 78:
and with redesign (here I switched from behavior_cloning_2 to behavior_cloning, because the former was removed from repo):
This difference in performance was confirmed by the next runs of the both code versions.
Is there anything what should be adjusted in the dapg example to achieve the same learning performance as before?