Got bad performance when reproducing HACO

dq0803 commented 1 year ago

Hi, I just attempt to reproduce HACO with keyboard by running "train_haco_keyboard_easy.py ", but encountered unsatisfactory training performance.

At the early stage, I can see the model was improved with the help of human interventions. After around 20~40 iterations, the car has learned some driving skills, and occasionally managed to reach the destination, albeit with uneven performance. However, after a few more iterations, strange things occurred. The car failed to start normally and would brake suddenly while driving. It seems like the model forgot the skills it previously learned and its performance worsened.

Could you please explain the reasons behind this issue? Is it related to improper timing for human intervention, an excessive focus on exploration, or some other factor?

The screenshot below is the evaluation results by running "eval_haco.py ", with EPISODE_NUM_PER_CKPT = 2.

pengzhenghao commented 1 year ago

Hi, I just attempt to reproduce HACO with keyboard by running "train_haco_keyboard_easy.py ", but encountered unsatisfactory training performance.

Thank you for running our code!

At the early stage, I can see the model was improved with the help of human interventions. After around 20~40 iterations, the car has learned some driving skills, and occasionally managed to reach the destination, albeit with uneven performance.

This is expected.

However, after a few more iterations, strange things occurred. The car failed to start normally and would brake suddenly while driving. It seems like the model forgot the skills it previously learned and its performance worsened.

This is also expected and observed during our experiments.

Could you please explain the reasons behind this issue? Is it related to improper timing for human intervention, an excessive focus on exploration, or some other factor?

Sure! First I am really happy you reproduce (even the strange behavior) our experiments.

In the beginning of the training, we sometimes will take a full control for 1 or 2 episodes, in order to fill the human buffer with more useful data. Then we will enter human-AI shared control and intervenes when something go wrong.

In our experiment we are very conservative and maintain the speed to near 15~20 kmh (so either the throttle we gave during intervention is also in a low value). We never brake during intervention, because we find the policy will soon converge to emergency stopping and never move again. And as you already saw, after training a long period the policy will collapse.

We have some hypothesis behind this:

the Q values might be too large as the outcome of CQL loss
the acceleration demonstration only occupies a very small portion of the human buffer and thus those data samples are hard to be sampled, neither to be learned.

xiaozhao12345 commented 8 months ago

Hello, how to solve the problem of suddenly emergency braking after several iterations?

pengzhenghao commented 7 months ago

Hello, how to solve the problem of suddenly emergency braking after several iterations?

That's a good question! We observe that too!

The answer is:

do not brake at all as a human demonstrator
keep slow and almost constant speed
try to use our new algorithm PVP: https://github.com/metadriverse/pvp

decisionforce / HACO

Got bad performance when reproducing HACO #6