Svalorzen / AI-Toolbox

A C++ framework for MDPs and POMDPs with Python bindings
GNU General Public License v3.0
647 stars 98 forks source link

Migrate from R(s,a,s) to R(s,a) #26

Closed Svalorzen closed 6 years ago

Svalorzen commented 6 years ago

While it has been all fun and games storing the R(s,a,s) for all models, I think that the time has come to remove this "feature" and just store the R(s,a), which is pretty much equivalent.

This simultaneously reduces the computational costs of pretty much all methods, since they often have to compute R(s,a) anyway, and memory consumption (by a factor of |S|, which is not little).

We should still offer an API in order to create models from a 3D reward function, but internally it would be much much better to stop storing the whole thing.

This will change a bunch of the API for models since they won't offer a s,a,s' access to the reward function anymore, but I think in the long run it's going to be better, considering the performance improvements.

I'm currently finishing a branch, and this will be the next thing I do after that.

Svalorzen commented 6 years ago

I thought more about this and I think it's actually more sensible to keep the getExpectedReward(s,a,s1) signature. It's only the Eigen part that will change, and the fact that our Models will only keep a 2d table for rewards (so getExpectedReward will return the same thing for all s1).

This makes easier to define models and shouldn't deprecate old user code, but should still speedup Eigen methods in case somebody wants to use them.

Svalorzen commented 6 years ago

I'm currently implementing this in the GapMin branch. It seems it will also speed the library somewhat, which is always nice.

Svalorzen commented 6 years ago

This has been merged in master.