Open dq0803 opened 1 year ago
Hi, I just attempt to reproduce HACO with keyboard by running "train_haco_keyboard_easy.py ", but encountered unsatisfactory training performance.
Thank you for running our code!
At the early stage, I can see the model was improved with the help of human interventions. After around 20~40 iterations, the car has learned some driving skills, and occasionally managed to reach the destination, albeit with uneven performance.
This is expected.
However, after a few more iterations, strange things occurred. The car failed to start normally and would brake suddenly while driving. It seems like the model forgot the skills it previously learned and its performance worsened.
This is also expected and observed during our experiments.
Could you please explain the reasons behind this issue? Is it related to improper timing for human intervention, an excessive focus on exploration, or some other factor?
Sure! First I am really happy you reproduce (even the strange behavior) our experiments.
In the beginning of the training, we sometimes will take a full control for 1 or 2 episodes, in order to fill the human buffer with more useful data. Then we will enter human-AI shared control and intervenes when something go wrong.
In our experiment we are very conservative and maintain the speed to near 15~20 kmh (so either the throttle we gave during intervention is also in a low value). We never brake during intervention, because we find the policy will soon converge to emergency stopping and never move again. And as you already saw, after training a long period the policy will collapse.
We have some hypothesis behind this:
Hello, how to solve the problem of suddenly emergency braking after several iterations?
Hello, how to solve the problem of suddenly emergency braking after several iterations?
That's a good question! We observe that too!
The answer is:
Hi, I just attempt to reproduce HACO with keyboard by running "train_haco_keyboard_easy.py ", but encountered unsatisfactory training performance.
At the early stage, I can see the model was improved with the help of human interventions. After around 20~40 iterations, the car has learned some driving skills, and occasionally managed to reach the destination, albeit with uneven performance. However, after a few more iterations, strange things occurred. The car failed to start normally and would brake suddenly while driving. It seems like the model forgot the skills it previously learned and its performance worsened.
Could you please explain the reasons behind this issue? Is it related to improper timing for human intervention, an excessive focus on exploration, or some other factor?
The screenshot below is the evaluation results by running "eval_haco.py ", with EPISODE_NUM_PER_CKPT = 2.