Closed baedan closed 2 years ago
Nice work!
By the way, any idea where the performance gain comes from?
not sure! i didn't look at the old version super closely the first time, though now that i'm looking at it again, it seems like Tiling
is meant to store all 50 tilings at once (or one), but here
tt
is defined as an array of 50 Tiling
s. idk how much of an impact that would have though.
oh, and btw, both versions generate basically the same training curve, but it's different from the one in the book… i'm sure you've noticed this because you set the episode count to 10000 when the 50-tile version finally has an advantage haha
i tried to look for why that is but no dice. could be something a bit awry in RL.jl
, maybe?
ran a profiler on the old version. maybe dynamic dispatch performance loss with Flux.onehot
?
oh, and btw, both versions generate basically the same training curve, but it's different from the one in the book… i'm sure you've noticed this because you set the episode count to 10000 when the 50-tile version finally has an advantage haha
i tried to look for why that is but no dice. could be something a bit awry in
RL.jl
, maybe?
Yeah, I struggled to figure out the potential reason and then gave up after several attempts in the late night. 😭
Thanks for the profiling. It's quite surprising to find onehot is the bottle neck here.
in my testing, there is a performance speedup of 6 times