Parametric Episode iterator

make it type stable and improve performance

Benchmark script

using Reinforce
env = Reinforce.CartPole()
π′ = RandomPolicy()
Randome.seed!(42)
@benchmark run_episode(x -> x, env, π′)

Before:

julia> @benchmark run_episode(x -> x, env, π′)
BenchmarkTools.Trial:
  memory estimate:  2.39 KiB
  allocs estimate:  92
  --------------
  minimum time:     5.581 μs (0.00% GC)
  median time:      12.884 μs (0.00% GC)
  mean time:        19.578 μs (22.57% GC)
  maximum time:     36.822 ms (99.85% GC)
  --------------
  samples:          10000
  evals/sample:     1

After:

julia> @benchmark run_episode(x -> x, env, π′)
BenchmarkTools.Trial:
  memory estimate:  1.37 KiB
  allocs estimate:  29
  --------------
  minimum time:     981.900 ns (0.00% GC)
  median time:      1.648 μs (0.00% GC)
  mean time:        2.170 μs (22.77% GC)
  maximum time:     3.367 ms (99.93% GC)
  --------------
  samples:          10000
  evals/sample:     10

JuliaML / Reinforce.jl

Parametric Episode iterator #31