Ch. 8's Dueling implementation appears to be incorrect

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt

MIT License

2.83k stars 1.29k forks source link

Ch. 8's Dueling implementation appears to be incorrect #6

Closed Sycor4x closed 5 years ago

Sycor4x commented 5 years ago

https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/a307a952bb914a1b5a43b7c92b7237ca28d88d28/Chapter08/lib/models.py#L56

This is a minor hiccup -- the line currently takes the average across the entire batch_size * n_actions tensor, making adv.mean() just a scalar value. Instead, each observation in the mini-batch should have the mean subtracted off, so it should be out = val + adv - adv.mean(dim=1, keepdim=True) which has batch_size elements.

Shmuma commented 5 years ago

Thanks for pointing this! Will fix and commit this.

Have you noticed any effect on convergence dynamics? For diverse batch samples it could potentially eliminate the dueling effect due to wrong advantage value.

Sycor4x commented 5 years ago

You're welcome. I haven't extensively tested and compared the two versions.