Changing the reward spec of an existing environment? (Clean Up)

google-deepmind / meltingpot

A suite of test scenarios for multi-agent reinforcement learning.

Apache License 2.0

577 stars 116 forks source link

Changing the reward spec of an existing environment? (Clean Up) #165

Closed AsadJeewa closed 1 year ago

AsadJeewa commented 1 year ago

I am trying to manipulate clean up to have a vectorised reward and have been able to do so by changing meltingpot/lua/levels/clean_up/components.lua and meltingpot/python/configs/substrates/clean_up.py . I made the reward type tableType but the issue is that I need to change the reward spec (since it crashes with Check failed: double_tensor != nullptr [observation] - Must return a contiguous DoubleTensor or number while reading: '1.REWARD') but changing the shape in meltingpot/python/utils/substrate/specs.py has no effect. Please advise.

duenez commented 1 year ago

This is tricky, we tend not to support rewards that are not scalar. The assumption that rewards are scalar is baked into the Avatar component. However, what you want to achieve is easy in principle. You are able to expose any custom observation, for instance as a vector (or even a matrix or tensor). Take a look at how we handle inventories in *_in_the_matrix substrates, for instance. Then just use that observation as your vectorised reward in your algorithm.

Hope that helps.

jagapiou commented 1 year ago

Also note that any custom reward won't be available at test time (i.e. in the scenarios) so best to not condition your policy on that (though your critic can do whatever it likes).

AsadJeewa commented 1 year ago

Thank you! Making the vectorised reward a custom observation worked :D