Closed nyck33 closed 5 years ago
Yep, they are reversed, but this doesn't make much difference, as the result is squared
Max,
I agree that it would not make a difference among RL professionals who already are familiar with the formula but in this case as students it can be confusing, ie. I read too much into it and thought well, there must be a reason that is reversed.
Regards,
Nobu
On Wed, 26 Jun 2019 at 14:31, Max Lapan notifications@github.com wrote:
Yep, they are reversed, but this doesn't make much difference, as the result is squared
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AGAFZKK6QZ2HWWFCS32RZOLP4MELHA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYSPO2A#issuecomment-505739112, or mute the thread https://github.com/notifications/unsubscribe-auth/AGAFZKPDP34J753RTXA64T3P4MELHANCNFSM4H3OPHMQ .
-- Nobutaka Github: https://github.com/nyck33 Blog: https://nobu-portfolio.blogspot.com/ Linkedin: https://www.linkedin.com/in/stayfit4ever/ Tel:0900344207 Skype: nobutaka.gold3@gmail.com Line: nobu_2018
Yep, I agree. Sorry for confusion.
чт, 27 июня 2019 г. в 8:12, Nobutaka notifications@github.com:
Max,
I agree that it would not make a difference among RL professionals who already are familiar with the formula but in this case as students it can be confusing, ie. I read too much into it and thought well, there must be a reason that is reversed.
Regards,
Nobu
On Wed, 26 Jun 2019 at 14:31, Max Lapan notifications@github.com wrote:
Yep, they are reversed, but this doesn't make much difference, as the result is squared
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AGAFZKK6QZ2HWWFCS32RZOLP4MELHA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYSPO2A#issuecomment-505739112 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AGAFZKPDP34J753RTXA64T3P4MELHANCNFSM4H3OPHMQ
.
-- Nobutaka Github: https://github.com/nyck33 Blog: https://nobu-portfolio.blogspot.com/ Linkedin: https://www.linkedin.com/in/stayfit4ever/ Tel:0900344207 Skype: nobutaka.gold3@gmail.com Line: nobu_2018
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AAAQE2RE7MHBMM2UYWFSRQLP4RK5FA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYWBSUY#issuecomment-506206547, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQE2U3DYYWMBQVUTZLPUDP4RK5FANCNFSM4H3OPHMQ .
-- wbr, Max Lapan
Thanks for listening!
On Thu, 27 Jun 2019 at 14:39, Max Lapan notifications@github.com wrote:
Yep, I agree. Sorry for confusion.
чт, 27 июня 2019 г. в 8:12, Nobutaka notifications@github.com:
Max,
I agree that it would not make a difference among RL professionals who already are familiar with the formula but in this case as students it can be confusing, ie. I read too much into it and thought well, there must be a reason that is reversed.
Regards,
Nobu
On Wed, 26 Jun 2019 at 14:31, Max Lapan notifications@github.com wrote:
Yep, they are reversed, but this doesn't make much difference, as the result is squared
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <
, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AGAFZKPDP34J753RTXA64T3P4MELHANCNFSM4H3OPHMQ
.
-- Nobutaka Github: https://github.com/nyck33 Blog: https://nobu-portfolio.blogspot.com/ Linkedin: https://www.linkedin.com/in/stayfit4ever/ Tel:0900344207 Skype: nobutaka.gold3@gmail.com Line: nobu_2018
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AAAQE2RE7MHBMM2UYWFSRQLP4RK5FA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYWBSUY#issuecomment-506206547 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAAQE2U3DYYWMBQVUTZLPUDP4RK5FANCNFSM4H3OPHMQ
.
-- wbr, Max Lapan
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/53?email_source=notifications&email_token=AGAFZKJUYSKYEMKSADYRNTDP4ROCLA5CNFSM4H3OPHM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYWDJVQ#issuecomment-506213590, or mute the thread https://github.com/notifications/unsubscribe-auth/AGAFZKJVUJV2SRNUFPNQWBDP4ROCLANCNFSM4H3OPHMQ .
-- Nobutaka Github: https://github.com/nyck33 Blog: https://nobu-portfolio.blogspot.com/ Linkedin: https://www.linkedin.com/in/stayfit4ever/ Tel:0900344207 Skype: nobutaka.gold3@gmail.com Line: nobu_2018
The code here: train a2c chapter 14
Has this snippet:
but the textbook shows the equation for
p1
to be:where it is (x-u) not (u-x), assuming x is action and u is mu or the mean.
Is this an error in the implementation?