Chapter 14, PDF equation has action and mu reversed in 02_train_a2c.py?

nyck33 commented 5 years ago

Has this snippet:

def calc_logprob(mu_v, var_v, actions_v):
    p1 = - ((mu_v - actions_v) ** 2) / (2*var_v.clamp(min=1e-3))
    p2 = - torch.log(torch.sqrt(2 * math.pi * var_v))
    return p1 + p2

but the textbook shows the equation for p1 to be:

where it is (x-u) not (u-x), assuming x is action and u is mu or the mean.

Is this an error in the implementation?

Shmuma commented 5 years ago

Yep, they are reversed, but this doesn't make much difference, as the result is squared

nyck33 commented 5 years ago

Max,

I agree that it would not make a difference among RL professionals who already are familiar with the formula but in this case as students it can be confusing, ie. I read too much into it and thought well, there must be a reason that is reversed.

Regards,

Nobu

On Wed, 26 Jun 2019 at 14:31, Max Lapan notifications@github.com wrote:

Yep, they are reversed, but this doesn't make much difference, as the result is squared

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AGAFZKK6QZ2HWWFCS32RZOLP4MELHA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYSPO2A#issuecomment-505739112, or mute the thread https://github.com/notifications/unsubscribe-auth/AGAFZKPDP34J753RTXA64T3P4MELHANCNFSM4H3OPHMQ .

-- Nobutaka Github: https://github.com/nyck33 Blog: https://nobu-portfolio.blogspot.com/ Linkedin: https://www.linkedin.com/in/stayfit4ever/ Tel：0900344207 Skype: nobutaka.gold3@gmail.com Line: nobu_2018

Shmuma commented 5 years ago

Yep, I agree. Sorry for confusion.

чт, 27 июня 2019 г. в 8:12, Nobutaka notifications@github.com:

Max,

I agree that it would not make a difference among RL professionals who already are familiar with the formula but in this case as students it can be confusing, ie. I read too much into it and thought well, there must be a reason that is reversed.

Regards,

Nobu

On Wed, 26 Jun 2019 at 14:31, Max Lapan notifications@github.com wrote:

Yep, they are reversed, but this doesn't make much difference, as the result is squared

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AGAFZKK6QZ2HWWFCS32RZOLP4MELHA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYSPO2A#issuecomment-505739112 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AGAFZKPDP34J753RTXA64T3P4MELHANCNFSM4H3OPHMQ

.

-- Nobutaka Github: https://github.com/nyck33 Blog: https://nobu-portfolio.blogspot.com/ Linkedin: https://www.linkedin.com/in/stayfit4ever/ Tel：0900344207 Skype: nobutaka.gold3@gmail.com Line: nobu_2018

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AAAQE2RE7MHBMM2UYWFSRQLP4RK5FA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYWBSUY#issuecomment-506206547, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQE2U3DYYWMBQVUTZLPUDP4RK5FANCNFSM4H3OPHMQ .

-- wbr, Max Lapan

nyck33 commented 5 years ago

Thanks for listening!

On Thu, 27 Jun 2019 at 14:39, Max Lapan notifications@github.com wrote:

Yep, I agree. Sorry for confusion.

чт, 27 июня 2019 г. в 8:12, Nobutaka notifications@github.com:

Max,

I agree that it would not make a difference among RL professionals who already are familiar with the formula but in this case as students it can be confusing, ie. I read too much into it and thought well, there must be a reason that is reversed.

Regards,

Nobu

On Wed, 26 Jun 2019 at 14:31, Max Lapan notifications@github.com wrote:

Yep, they are reversed, but this doesn't make much difference, as the result is squared

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AGAFZKK6QZ2HWWFCS32RZOLP4MELHA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYSPO2A#issuecomment-505739112

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AGAFZKPDP34J753RTXA64T3P4MELHANCNFSM4H3OPHMQ

.

-- Nobutaka Github: https://github.com/nyck33 Blog: https://nobu-portfolio.blogspot.com/ Linkedin: https://www.linkedin.com/in/stayfit4ever/ Tel：0900344207 Skype: nobutaka.gold3@gmail.com Line: nobu_2018

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AAAQE2RE7MHBMM2UYWFSRQLP4RK5FA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYWBSUY#issuecomment-506206547 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAAQE2U3DYYWMBQVUTZLPUDP4RK5FANCNFSM4H3OPHMQ

.

-- wbr, Max Lapan

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AGAFZKJUYSKYEMKSADYRNTDP4ROCLA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYWDJVQ#issuecomment-506213590, or mute the thread https://github.com/notifications/unsubscribe-auth/AGAFZKJVUJV2SRNUFPNQWBDP4ROCLANCNFSM4H3OPHMQ .

-- Nobutaka Github: https://github.com/nyck33 Blog: https://nobu-portfolio.blogspot.com/ Linkedin: https://www.linkedin.com/in/stayfit4ever/ Tel：0900344207 Skype: nobutaka.gold3@gmail.com Line: nobu_2018

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Chapter 14, PDF equation has action and mu reversed in 02_train_a2c.py? #53