Div99 / IQ-Learn

(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation
https://div99.github.io/IQ-Learn/
Other
196 stars 31 forks source link

Issue on reproduce MuJoCo results-HalfCheetah-v2 #5

Closed shuoye1000 closed 2 years ago

shuoye1000 commented 2 years ago

Dear Author, it's an honor to see your paper and code! I am a novice in this area and now I am trying to reproduce the effect of your experiment, but I have encountered some obstacles. In Half-Cheetah, I don't get the 5076.6 points in the paper, even my reward is less than 0 in most cases, the code is not modified, is the reason the hyperparameter setting? If so, could you share your hyperparameter setting? Thanks for sharing!

Div99 commented 2 years ago

Hi, please set agent.learn_temp=False and it should work. In my experience, SAC style temperature learning is not very stable with IQ-Learn and can prevent the method from converging. I pushed a change where the temperature learning is disabled by default.

Div99 commented 2 years ago

Here are some training results I ran yesterday with 1 and 10 expert demos, with the hyperparams in run_mujoco.sh with the temp learning disabled:

W B Chart 4_14_2022, 11_04_56 AM

Div99 commented 2 years ago

Imp Update: I found that temperature learning can also work well if the SAC target_entropy param is set to much lower than −dim(A), as we don't need much exploration for IL settings. For Half-Cheetah, setting the target_entropy=-24 works well. So empirically *4 -dim(A)** could be a good value to try with IQ-Learn

Reward Curves: W B Chart 4_14_2022, 5_09_19 PM

shuoye1000 commented 2 years ago

关于您给出的建议,我重新下载了您的代码并且重新运行了脚本,这是我使用的命令:【python train_iq.py env=cheetah agent=sac expert.demos=10 method.loss=value method.regularize=True agent.actor_lr=3e-05 seed=0 agent.learn_temp=False】 This is the output of my console parameters. image image I'm too clumsy, whether the obs_dim and action_dim of the agent need to be modified by myself ? And here are the training results I've gotten so far: image I would be very grateful for your guidance!

Div99 commented 2 years ago

Hi, the obs_dim and action_dim are set automatically by the code and you can ignore them. The default SAC agent temperature in the project repo was too high and I pushed a fix setting it to 1e-2 . For the HalfCheetah environment, both 1e-2 or 1e-3 temperature values should work very well.

shuoye1000 commented 2 years ago

Through your tireless guidance, I finally solved the problem, thank you very much for your patience! Other than that, I still found some other minor issues, it seems that the following are missing from our code: Humanoid experiment data (parameters, expert data, etc.), Breakout expert data, Space Invaders expert data. If you could upload these data, it would help me to reproduce your paper better. And if the file is too large, zip seems to be a good option. You are a very charismatic author, good luck with your work! O`BH@X8)JX}IB06ZR`%ODD2

Div99 commented 2 years ago

Hi, the link to the Atari datasets is fixed. For Humanoid, the expert data got accidentally deleted, and I will try to retrain a SAC agent if I can find time on the weekend to collect new expert trajectories.

BepfCp commented 2 years ago

Hi, thanks to your work and patient guidance. Just a small question, why dose your SAC on HalfCheetah-v2 only get 5000 points? From my own experience, SAC can reach to at least 12000 points within 1M steps. See similar results from OpenAI's SpinningUp or the picture below. image

liushunyu commented 2 years ago

Hi, the link to the Atari datasets is fixed. For Humanoid, the expert data got accidentally deleted, and I will try to retrain a SAC agent if I can find time on the weekend to collect new expert trajectories.

HI, I cannot find Space Invaders expert data in the link

Div99 commented 2 years ago

For the half-cheetah expert, we may have trained with a single critic, instead of double critics so performance is not as high. (I don't know exactly why we did it, but the baselines we compared against like ValueDICE had similar expert performance so we likely didn't optimize too much). Nevertheless, a better expert should easily translate to better imitation with IQ-Learn, as it easily saturates the current expert performance at ~5000 level

Div99 commented 2 years ago

HI, I cannot find Space Invaders expert data in the link

Sorry for missing SpaceInvaders in the dataset release, I will add it to the GDrive.

liushunyu commented 2 years ago

HI, I cannot find Space Invaders expert data in the link

Sorry for missing SpaceInvaders in the dataset release, I will add it to the GDrive.

Thanks!!! By the way, can I run SQIL based on your code? I find the "sqil.yaml" in the "iq_learn/conf/method". How can I run it? Moreover, do you provide the GAIL implementation using the same expert data as you?

Div99 commented 2 years ago

We used this repo for running GAIL: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail. You can add our data-loading setup to it to use our provided expert demos. SQIL code was removed in the newer commits from our repo but is very simple to implement, and you can check an old version of the codebase

Div99 commented 2 years ago

I have added the expert datasets along with IQ-Learn results on Humanoid-v2 along. Also released a script to generate your expert trajectories for new environments