first multi-agent environment

@findmyway @jonathan-laurent

From tomorrow, I'll start working on a multi-agent environment. I am thinking of a cooperative multi-agent version of the CollectGemsUndirected environment with full-observability. A few reasons/advantages for choosing this:

This problem seems easy to transition from single-agent to multi-agent. Both single agent and multi-agent versions have the same purpose, to collect as many gems as fast as possible.
Rewards are distributed somewhat evenly during the episode. I think makes it easier to learn than sparse reward environments (as in the case of goal-reaching environments).
The difficulty of this problem is easily tunable with the density of scattered gems and number of agents.
Visualization of behavior in this environment can aid in easily detecting if the agents are learning to collaborate with each other (for example, by collecting gems from different regions of the map) or not (competing for the same gems).
Fully-observable because we want the agents to have a broader context in order to be able to collaborate effectively. And it is simpler to implement than partially-observable too.

Let me know what you think.

Also, please suggest if there is a better platform to document our discussions. RL.jl has a discussions section. Maybe we can enable it for GridWorlds.jl @findmyway .

JuliaReinforcementLearning / GridWorlds.jl

first multi-agent environment #139