Closed w4tsn closed 7 years ago
How many is several? The RUNNING status is detected by a file that's created by the Tournament Module, so if you're experiencing severe disk IO blocking then this may happen. It may also be a firewall issue? Not sure if Rick has any other ideas, but I've never come across this issue before
I tried it with 2 and 4 (one with Server and Client, the others only Client), but all on the same machine with 12 Cores and 32 Gb of RAM and the VMs virtual disks all residing on one SSD drive.
Yeah, IO could be blocked. When the VMs only have one core and therefore can't handle Java, Server, StarCraft and OS fast enough.
If the VMs are located on a HDD (which is fairly old and slow) it gets even worse. This way even 2 CPU cores per VM are not enough to handle the requests.
Firewalls are deactivated.
Okay, so it appears to just be an issue like you said when resources are extremely low on a single core, which sort of makes sense. I guess the tasks related to IO are super low priority behind the OS and Starcraft so 2 cores seems to be a must. @richard-kelly can you add a note in the documentation to say that 2 cores should be a minimum for each client machine?
Thanks for the heads up, however I'm really not sure that this is an issue that we can necessarily fix, so we'll just warn people instead :)
I set up 2 Windows 10 VMs with 4 GB of RAM and 1 CPU. I also tried 4 VMs that way.
In both cases there would be one VM hosting a server and one client, while the rest hosts clients. All VMs reside on the same host machine with 12 Cores and 32 GB of RAM. The virtual disks are located on a SSD.
Running with 1 Core and 4 GB RAM caused the client to not set the state to RUNNING, but left it at STARTING after starting the Starcraft Game. The game then runs, but no data is transferred to the server. After about 5 to 10 minutes the client recovers, sets the state to RUNNING and eventually continues with tournament execution.
When setting CPU count to 2 everything runs just fine.
It's not a big issue, but I could run double the amount of VMs if this would work. I assume it has something to do with threads that get paused and only recover when the process eventually receives processing time.
I will have a look into this when I have more time this week.