Closed albertcity closed 4 years ago
Hello @albertcity -- thanks for your interest!
With some tweaks, the current model should work fine up to probably a couple thousand agents. I've designed Neural MMO such that the Server layer really isn't doing much -- the amount of computation required to process agent actions is very small. That said, the long-term goal is to support massive virtual worlds. I've actually taken a step back from infra recently, just because it began to consume the project for months at a time. The current plan is to migrate to RLlib. I'm in active discussion with the devs, and we've already gotten several crucial features and bug fixes. They have a good plan for supporting increasingly large and complex environments, and ideally they'll have reasonable functionality for this by the time we really need to start scaling NMMO.
There's actually nothing preventing you from training online in NMMO. I haven't tried this myself yet because people don't really do this in RL. The closest thing to within-lifetime learning is just having a recurrent policy and performing adaptation within the hidden state, as per RL-squared and derivatives thereof. If you have better ideas, I'd be happy to chat and improve support.
You should join our Discord https://discord.gg/BkMmFUC (Slack alternative) if you're interested in getting involved with the project. We use it for all development discussion -- Github is mainly reserved for bugs and feature PRs
I note that in the 3-layer cluster-server-client architecture, a server must be able to run at least one whole environment. Howerver, we may has little chance to meet this condition in some cases. For instance, if we stimulate the traffic systm of a large city, every single step in this env will take a long time to compute. Thus, it should be better to stimulate this env using several severs, and this may lead to sync problems between severs. Do you have any plan to add such a new feature? What's more, it seems that neural-mmo lacks a mechanism to train agents online(i.e. update the action policy through the trajectory). I am interested in finding/designing a framework for massive multiagent environment stimulating. I will appreciate it if you have any idea to share. Thanks!