LucasAlegre / morl-baselines

Multi-Objective Reinforcement Learning algorithms implementations.
https://lucasalegre.github.io/morl-baselines
MIT License
271 stars 44 forks source link

Pareto Q-Learning fails with any environment that doesn't have 2D state/obs space #69

Closed LeonEricsson closed 10 months ago

LeonEricsson commented 11 months ago

As the title suggests running PQL with an environment that doesn't have a 2D state/obs space will run into an error during training (pql.py):

state = int(np.ravel_multi_index(state, self.env_shape))

ravel_multi_index expects the first parameter to be of the same sequence length as the second otherwise it will throw a ValueError. Since env_shape is hard-coded to be 2D

self.env_shape = ( int(high_bound[0] - low_bound[0] + 1), int(high_bound[1] - low_bound[1] + 1), )

this assumes the state/obs is always of length 2. I assume something is off here, or that I've got something completely wrong but I'm consistently getting errors in environments that don't have a 2D obs space.

ffelten commented 11 months ago

Hi @LeonEricsson ,

You are right, our implementation PQL only works on deterministic environments with 2d observation, typically grid worlds.

I believe it can be extended to more dimensions with a bit of work though.

LeonEricsson commented 11 months ago

Hi @LeonEricsson ,

You are right, our implementation PQL only works on deterministic environments with 2d observation, typically grid worlds.

I believe it can be extended to more dimensions with a bit of work though.

I am able to run by just expanding the self.env_shape as

self.env_shape = tuple( int(high_bound[i] - low_bound[i] + 1) for i in range(len(low_bound)) )

but I am too unfamiliar with the algorithm/code to determine if this has further implications.

ffelten commented 11 months ago

@wilrop can you have a look at this? I think just changing the shape might indeed be enough?

wilrop commented 11 months ago

I will take a look at this next week after the ICLR deadline!

ffelten commented 11 months ago

@LeonEricsson this should be fixed now (I merged Willem's PR on main). Can you try it out? :-)

ffelten commented 10 months ago

Closing since inactive