JuliaPOMDP / POMDPs.jl

MDPs and POMDPs in Julia - An interface for defining, solving, and simulating fully and partially observable Markov decision processes on discrete and continuous spaces.
http://juliapomdp.github.io/POMDPs.jl/latest/
Other
652 stars 99 forks source link

Support Hooks ? #505

Closed NeroBlackstone closed 4 weeks ago

NeroBlackstone commented 1 year ago

Is there any counterpart of Hooks in POMDPs.jl? As I know the only way to get information from pomdps.jl solver, is to copy and edit the solver source code. (it's my solution... It's hard to plot if we don't have hooks.

I don't know if the maintainer of POMDPs.jl also thinks this is a useful feature.

And I wonder how to implement it.

zsunberg commented 1 year ago

So far, we have left the question of accessing additional information up to solver writers. The solve_info and action_info functions in POMDPTools sometimes output additional information.

Can you describe what information you are trying to get out of what solver, and perhaps we can think about generalizing from that point.

NeroBlackstone commented 1 year ago

Thanks for your reply. For example, solvers in TabularTDLearning.jl, the solver will evaluate trained policy every eval_every episode. We want to get the average reward of the trained policy trajectory while algorithm running.

NeroBlackstone commented 1 year ago

render function has a similar concept, but it's for the problem.

zsunberg commented 1 year ago

I think it's best to try adding some hooks to that particular package and then generalize from that if we can find a way.

In general one challenge is that we have fairly different types of solvers in the POMDPs.jl ecosystem

  1. Offline optimization solvers like SARSOP
  2. Online tree search solvers like POMCP and DESPOT
  3. Reinforcement Learning solvers like tabular td learning

The hooks for these different types might be very different.

NeroBlackstone commented 11 months ago

Yes, I agree with @zsunberg , since different types of solvers exist, maybe we never have a unified solution.

But when I was going to bed last night, I got some inspirations.

We could pass a callback function to solve function.

like:

 function solve(f::Function,solver::QLearningSolver, mdp::MDP)
    # codes....

    f(episode,average_reward)

    # codes...
end
solve(qsolver,mdp) do episode,average_reward
    # collect data!
end
# plot!

Unfortunately, it's a break change. Maybe we could define callback as optional args.

But I still think at least we could propose a "hook convention".

NeroBlackstone commented 11 months ago

Maybe we could directly return data in solve_info(), but compared to the callback function, we could not get data while solver running.

The callback function is useful for long-time algorithms, so we can update plots to visually check algorithm status.

I still don't know what is best practice, since there is no solver implementing this, maybe we could implement one to show the right way.

Feel free to close this issue. :)

zsunberg commented 11 months ago

But when I was going to bed last night, I got some inspirations.

@NeroBlackstone , thanks for using your bedtime thoughts to try to improve this package! :)

In general, I like this proposal, but there is one hard question related to the diversity of solvers: What arguments should be passed to the callback?

I also think that a better first step would be to add callbacks to individual solvers as solver options, for instance, it could be used like this

solver = NativeSARSOP.SARSOPSolver() do tree, alphas
    # print statistics from the tree or something
end
solve(solver, m)

One more note:

Unfortunately, it's a break change.

I don't think this is actually a breaking change, because we could define solve(f, solver, m) = solve(solver, m) as a fallback.

zsunberg commented 4 weeks ago

closing for now since this seems to be a solver-specific issue.