crowdAI / marLo

Multi Agent Reinforcement Learning using MalmÖ
MIT License
244 stars 46 forks source link

Question about marLo competition #47

Open EC2EZ4RD opened 6 years ago

EC2EZ4RD commented 6 years ago

Hi, all the time I had some questions about marLo competition, and could anyone help me to figure it out?

  1. in https://www.crowdai.org/challenges/marlo-2018, it said that each game has 4 tasks. So what is the difference between these tasks? Different layout? Or different reward setting?
  2. How to submit code?
  3. It seemed that multi-agent setting only exists in the game of CatchTheMob. Are other games involved with multi-agent?
  4. Is there any time or round limit for training and testing? When testing, could agent keep learning from environment?
  5. Question about agent setting: one agent for one game, or one agent for many games, or one agent for all games? Thanks for anyone's response.
spMohanty commented 5 years ago
in https://www.crowdai.org/challenges/marlo-2018, it said that each game has 4 tasks. So what is the difference between these tasks? Different layout? Or different reward setting?

The 4 multiagent tasks will be released soon. They are parametric envs, and hence can be initialised with different parameters. For instance consider the MazeRunner task, there, the MazeHeight is a parameter, and the evaluator could use any value from a given bounds. All the available parameters with their description and bounds will be released along with the environments.

How to submit code?

Instructions will be added. But they will be similar to the process used in the NIPS Adversarial Vision Challenge, and the VizDoom Challenge. You will have to create a private repository with your code, models etc, and then everytime you create and push a tag, it will be considered as a submission. Details to follow soon.

It seemed that multi-agent setting only exists in the game of CatchTheMob. Are other games involved with multi-agent?

4 multi-agent environments will be released this week.

Is there any time or round limit for training and testing? When testing, could agent keep learning from environment?

There will be a total timeout of 2 hours for the evaluations across multiple episodes. It is upto the participants to choose to continue optimising their agents while they are being evaluated (they will anyway receive the reward signal, so they can choose to do so)

Question about agent setting: one agent for one game, or one agent for many games, or one agent for all games?

This needs to be clarified a little bit better. And we still need some internal discussions. But the best case scenario is one agent for many games (4 different parameterized environments); but the exact rule will be finalised based on the feasibility of the same from our own internal baselines.

EC2EZ4RD commented 5 years ago

Thanks for your answer! Waiting for marLo's new update.