AIcrowd / real_robots

Gym environments for Robots that learn to interact with the environment autonomously
https://www.aicrowd.com/challenges/neurips-2019-robot-open-ended-autonomous-learning
MIT License
34 stars 18 forks source link

Write a base class for the Policy function #6

Closed spMohanty closed 5 years ago

spMohanty commented 5 years ago

We should write a base class for the Policy function that all participants have to inherit from. That way, we also can have a consistent reference for participants on what their policy functions should provide. The said class should be well documented (so that the docs show up half decent on Sphinx docs)

emilio-cartoni commented 5 years ago

RandomPolicy class seems fine, I think we just need them to provide a step function.

Maybe we could also enforce a "pretrained" flag in the init, so that it can be contructed with either pretrained data (Round 1) or without (Round 2). Just to keep consistency between the two rounds.

spMohanty commented 5 years ago

@emilio-cartoni : Yes indeed. Thats why I wanted to have a single source of truth for the structure of that class and what interfaces they should expose. In the examples in the starter kit, we were actually inconsistent about how the step function was being called. In some examples, it was even called as act. So we could avoid that by having and documenting a base class for the Policy functino.

spMohanty commented 5 years ago

As per the discussion on the call , we will also include interfaces for the controllers to get a signal when the intrinsic phase begins and ends, and when the individual extrinsic phase trials begin and end.

emilio-cartoni commented 5 years ago

ef59d8828cc7c1b3c0145e8af60ef1d7ec8e7b2f Added policy.py file with a Policy class that participants should inherit from. Now all instances of RandomPolicy inherit from that. I have also changed evaluate so that it does call those methods. I have added a test to the test.py so that we also call evaluate when testing (but I have added no assertions).

spMohanty commented 5 years ago

@emilio-cartoni : I refactored the base policy a little bit, and now they have to defined as follows :

import numpy as np
from real_robots.policy import BasePolicy

class RandomPolicy(BasePolicy):
    def __init__(self, action_space):
        self.action_space = action_space
        self.action = np.zeros(action_space.shape[0])
        self.action += -np.pi*0.5

    def step(self, observation, reward, done):
        self.action += 0.4*np.pi*np.random.randn(self.action_space.shape[0])
        return self.action

This is included from the v0.1.8 release.