JuliaReinforcementLearning / ReinforcementLearning.jl

A reinforcement learning package for Julia
https://juliareinforcementlearning.org
Other
591 stars 112 forks source link

No method matching iterate ArrayProductDomain #1074

Closed ZdM87 closed 6 months ago

ZdM87 commented 6 months ago

This happens when trying to train an agent for the StockTradingEnv environment

env = StockTradingEnv()

ns, na = size(state_space(env))[1], size(action_space(env))[1]

policy = Agent(
    QBasedPolicy(;
        learner = TDLearner(
            TabularQApproximator(n_state = ns, n_action = na),
            :SARS;
        ),
        explorer = EpsilonGreedyExplorer(ϵ_stable=0.01),
    ),
    Trajectory(
        CircularArraySARTSTraces(;
            capacity = 10,
            state = Float64 => (ns,),
            action = Float64 => (na,),
            reward = Float64 => (),
            terminal = Bool => (),
        ),
        DummySampler(),
        InsertSampleRatioController(),
    ),
)

run(
    policy,
    env,
    StopAfterNSteps(10_000),
)
jeremiahpslewis commented 6 months ago

Hey! This error is confusing and I've opened a PR #1075 so that things error in the 'correct' spot, but the problem is straightforward: you are using a tabular algorithm to solve a continuous problem; your action space is not discrete, but the TabularQApproximator algorithm requires a discrete action space to learn on.

ZdM87 commented 6 months ago

Thank you. So, there is no support for continuous problem yet? Because the same problem occurred with

QBasedPolicy(;
        learner = FluxApproximator(
            Chain(
                Dense(ns, 64, relu),
                Dense(64, na, relu),
            ),
            Flux.Optimise.Optimiser(ClipNorm(0.5), ADAM(1e-5)),
        ),
        explorer = EpsilonGreedyExplorer(ϵ_stable=0.01),
    ),
jeremiahpslewis commented 6 months ago

@ZdM87 Flux now runs in the PR. Do you have any idea of sensible tests we can add here: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/844bf16e30cb5c47477db5a2f106ea3ec87e3f18/src/ReinforcementLearningEnvironments/test/environments/examples/stock_trading_env.jl to test that learning is working?

ZdM87 commented 6 months ago

I don't get what you mean. What does it mean Flux now runs in the PR.

jeremiahpslewis commented 6 months ago

Your example works now with the latest version, it’s even been added as a unit test