Open jvmncs opened 6 years ago
As of #15, safe-grid-agents has been added as a git submodule for testing agent compatibility. It should specifically point to the campx031 branch, which houses a version that's specifically compatible with torch 0.3.1 for interacting with CampX. As long as we keep it pointing to the most up-to-date commit on that branch, I'll make sure changes to master propagate over as long as we need them.
Overview
I've been working on a general purpose library for training safe RL agents called safe-grid-agents. It primarily uses the AI Safety Gridworlds from DeepMind.
The goal of this issue is to use the Base class from here (or something very similar) as a parent class to CampX's TensorWorld. This way, all we have to do is properly implement the abstractmethods from this Base class in order for us to be able to use specific environments based on TensorWorld with the agents I've been working on in safe-grid-agents.
Caveat
One thing we'll have to decide is if we want to use pycolab as a backend for this Base class, as is done here. One issue would be that pycolab is running numpy in the backend, and it's not clear how we could refactor that to use our MPC-shared version of PyTorch. It seems like the best way forward would be to just use the Base class and then try to mimic the kind of information that's supplied by the pycolab backend, but with torch tensors instead of with numpy arrays.
Additional requirements
In addition to the generic environment methods from Base, we'll also want two methods specific to the safety gridworlds --
get_overall_performance
, which returns the safety score for an episode, and_get_hidden_reward
, which supplies the per-timestep safety score. The latter is used for debugging, while the former can be used in some safe RL training schemes (e.g. semi-supervised RL). Implementing all the abstract methods as well as these two would give us an MVP of sorts that we can build on.Plan
Each of these should be spun out into separate issues (either individually or grouped).
observation_spec
andaction_spec
with tests