Inconsistency in action values

brohrer / robot-brain-project

a general purpose learning agent

205 stars 46 forks source link

Inconsistency in action values #44

Open markroxor opened 5 years ago

markroxor commented 5 years ago

It is expected that world.step receives a array of binary values as returned by brain.sense_act_learn which is returned by self.postprocessor.convert_to_actions but it's documentation says that it returns a A set of actions for the world, each between 0 and 1. The return of an action array of floats is inconsistent with the demands of openai's gym. Did I miss anything @brohrer ?

markroxor commented 5 years ago

Blocking #39

brohrer commented 5 years ago

At the moment, self.postprocessor.convert_to_actions() returns only 0 or 1 values, but in the future they may return float actions valued between 0 and 1. Worlds should be able to handle these, even if it is just to round them first. The documentation in becca may not all be consistent with this yet.

I'm not familiar with what Gym worlds expect. Is it very consistent across worlds? I expect there will need to be some connecting code to get them to talk smoothly. Translating actions into the expected format will probably be part of that.

markroxor commented 5 years ago

Can you please run this gist - https://gist.github.com/markroxor/c50a6bfc69da001180374a9e977ac21a (install gym first - pip install gym). The actions parameter which is fed to World.step is a float.

gym's environment expects an index of the action at each step. It can only perform one action at a time. Yes we would need some connecting code.

I think we need the api doc to proceed with the integration since the code docstrings cannot be relied upon.

brohrer commented 5 years ago

Nice work putting this connecting code together. I agree. I ran the gist and saw the same result and reached the same conclusions. I'm picturing some lines in init that, given a Gym world name, uses introspection to figure out the nature of the actions and the observations (Box and Discrete) and convert them to and from sensors and actions for becca.

An n-valued Discrete Gym would correspond to n sensors or actions. So would an n-dimensional Box.