Closed hzyjerry closed 3 years ago
Does this happen at the beginning or end of training? If it happens at the end, then it is likely not an issue as the policy has been trained and saved. If it occurs at the beginning, have you tried rerunning the script a few times? I recall encountering a bug like this on a few machines where the pytorch training library throws an error once in a while, but rerunning the script would resolve the issue. Sadly, I haven't had the free time to find and resolve this weird bug.
P.S. if you are training on a machine with 4, 8, or 16 virtual cores, I suggest adding an extra parameter '--num-rollouts 32'. I'll be adding this into the documentation soon. It keeps running the simulator for a total of 32 simulation rollouts before updating the PPO policy, and larger batch sizes of around 32 I have found to improve policy performance.
Hi Zack, thanks for the quick response!
The error happens at the beginning of the training, and unfortunately running multiple times didn't make it go away. I'm using my device with 12 cores and tried --num-rollouts 12
and --num-rollouts 24
but didn't work.
Looking more into it, it seems that when defining MLPBase
class, the num_inputs
is set to 0 and causes error with network initialization. Still looking and unsure why this happens (didn't make any change to the code yet).
Found it, the environment name should be ScratchItchJacoHuman-v0
instead of ScratchItchJaco-v0
:P
When training co-optimization policy in scratch environment (instruction
python -m ppo.train_coop --env-name "ScratchItchJaco-v0" --num-env-steps ...
), I ran into this error attached below. The strange thing is that it doesn't show up when training non-cooperative policies in Scratch environment, or when training cooperative policies in other tasks. It seems that this could be an issue with the coop training script.Any idea on why this happens?