Improbable-AI / pql

Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation
MIT License
62 stars 3 forks source link

Can't reproduce results on Franka Cube Stack #4

Open StoneT2000 opened 6 days ago

StoneT2000 commented 6 days ago

Hi, thank you for open sourcing PQL! I am trying to reproduce the results on one GPU (4090) and am finding that the algorithm eval return doesn't go past the 400 ish mark

I ran the following command to test PQL (not PQL-D)

python scripts/train_pql.py task=FrankaCubeStack algo.num_gpus=1 algo.p_learner_gpu=0 algo.v_learner_gpu=0

image

I am using ca7a4fb762f9581e39cc2aab644f18a83d6ab0ba as the IsaacGymEnvs git commit and am using the Isaac Gym 4 release, as well as the latest git commit of the pql repo. I don't think this should be a reward scale issue since I modified the code to print the success rate of successfully stacking the cube and it is mostly close to 0.

Any idea?

StoneT2000 commented 6 days ago

image

I ran a second seed which is now doing slightly better, although comparing sample efficiency with the original paper it seems to still be behind. At 120M steps it is around 600 reward but paper shows 700.

StoneT2000 commented 6 days ago

I also tried running PPO just now but it seems to fail completely using

python scripts/train_baselines.py algo=ppo_algo task=FrankaCubeStack isaac_param=True

image

Is there anything else I might need to add to test the baselines?