OpenMined / CampX

Tensor Based Environment Framework for Training RL Agents - Pre Alpha
8 stars 0 forks source link

Refactor for use with pycolab and safe-grid-agents #5

Open jvmncs opened 6 years ago

jvmncs commented 6 years ago

Overview

I've been working on a general purpose library for training safe RL agents called safe-grid-agents. It primarily uses the AI Safety Gridworlds from DeepMind.

The goal of this issue is to use the Base class from here (or something very similar) as a parent class to CampX's TensorWorld. This way, all we have to do is properly implement the abstractmethods from this Base class in order for us to be able to use specific environments based on TensorWorld with the agents I've been working on in safe-grid-agents.

Caveat

One thing we'll have to decide is if we want to use pycolab as a backend for this Base class, as is done here. One issue would be that pycolab is running numpy in the backend, and it's not clear how we could refactor that to use our MPC-shared version of PyTorch. It seems like the best way forward would be to just use the Base class and then try to mimic the kind of information that's supplied by the pycolab backend, but with torch tensors instead of with numpy arrays.

Additional requirements

In addition to the generic environment methods from Base, we'll also want two methods specific to the safety gridworlds -- get_overall_performance, which returns the safety score for an episode, and _get_hidden_reward, which supplies the per-timestep safety score. The latter is used for debugging, while the former can be used in some safe RL training schemes (e.g. semi-supervised RL). Implementing all the abstract methods as well as these two would give us an MVP of sorts that we can build on.

Plan

Each of these should be spun out into separate issues (either individually or grouped).

  1. Subclass Base in TensorWorld
  2. Implement observation_spec and action_spec with tests
  3. TODO rest of plan
jvmncs commented 6 years ago

As of #15, safe-grid-agents has been added as a git submodule for testing agent compatibility. It should specifically point to the campx031 branch, which houses a version that's specifically compatible with torch 0.3.1 for interacting with CampX. As long as we keep it pointing to the most up-to-date commit on that branch, I'll make sure changes to master propagate over as long as we need them.