PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.28k forks source link

A3C Bug? #33

Closed gowtham1997 closed 5 years ago

gowtham1997 commented 5 years ago

Hello,

This is regarding line 104-107 in 01_a3c_data.py https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/71617994fbefcf7c432cb229f007942d3b0d450c/Chapter11/01_a3c_data.py#L104-L107

On line 104, we are calculating loss between _valuev and _vals_refv after squeezing _valuev as its shape is (batch_size, 1) while _vals_refv has the shape (batch_size). This is clear to me.

But on line 107, we aren't squeezing _valuev before subtracting from _values_refv and the resulting _advv vector has the shape (batch_size, batch_size) and this also influences the shape of log_prob_actions_v at line 108.

And this _advv calculation is used in a2c.py(chapter 10) as well.

Is this a bug? I haven't compared the code with and without squeezing _valuev at line 107 but I am confused after inspecting the shapes.

liuyuezhang commented 5 years ago

Hi, I just read the code days ago and I don't think it's a bug.

It's just broadcasting. Tough it got more copies of the values, it was averaged by the mean() function in the line below.

Adding a squeeze() will be more intuitive.

gowtham1997 commented 5 years ago

Yes, makes sense.

Thanks for clarifying.