JuliaReinforcementLearning / ReinforcementLearningAnIntroduction.jl

Julia code for the book Reinforcement Learning An Introduction
https://juliareinforcementlearning.org/ReinforcementLearningAnIntroduction.jl/
MIT License
309 stars 58 forks source link

improved performance of figure 9.10 #88

Closed baedan closed 2 years ago

baedan commented 2 years ago

in my testing, there is a performance speedup of 6 times

findmyway commented 2 years ago

Nice work!

By the way, any idea where the performance gain comes from?

baedan commented 2 years ago

not sure! i didn't look at the old version super closely the first time, though now that i'm looking at it again, it seems like Tiling is meant to store all 50 tilings at once (or one), but here

https://github.com/JuliaReinforcementLearning/ReinforcementLearningAnIntroduction.jl/blob/e83f54055d621dbc44b205d2016c6868abf4b4a1/notebooks/Chapter09_Random_Walk.jl#L352

tt is defined as an array of 50 Tilings. idk how much of an impact that would have though.

baedan commented 2 years ago

oh, and btw, both versions generate basically the same training curve, but it's different from the one in the book… i'm sure you've noticed this because you set the episode count to 10000 when the 50-tile version finally has an advantage haha

i tried to look for why that is but no dice. could be something a bit awry in RL.jl, maybe?

baedan commented 2 years ago

image

ran a profiler on the old version. maybe dynamic dispatch performance loss with Flux.onehot?

findmyway commented 2 years ago

oh, and btw, both versions generate basically the same training curve, but it's different from the one in the book… i'm sure you've noticed this because you set the episode count to 10000 when the 50-tile version finally has an advantage haha

i tried to look for why that is but no dice. could be something a bit awry in RL.jl, maybe?

Yeah, I struggled to figure out the potential reason and then gave up after several attempts in the late night. 😭

Thanks for the profiling. It's quite surprising to find onehot is the bottle neck here.