Multiple Coordinators - Githubissues

RangeurGamer commented 6 months ago

Is your feature request related to a problem? Please describe. In the real central there are several control rooms in case there are problems, this will allow the person to have a control room on for example floor -1 of the building and another control room in another building that would allow us to do something a little more realistic

Describe the solution you'd like the possibility of having several Coordinator Servers and of configuring them so that this evening the primary, secondary, tertiary server, or the bacup

MikaylaFischler commented 6 months ago

Thanks for the feature request, but this is a lot more of an ask than you may realize. I bolded the paragraph below with the most important info here, but I suggest reading it all if you want the context.

As things currently are, the coordinator updating twice a second is one of if not the single largest communications transaction cost. For context, when HMAC is enabled (facility auth key), it takes ~15-35 milliseconds on my ATM8 server[^host] to hash a coordinator message while other devices take 1-5ms for their messages. This has to happen twice a second on both ends (supervisor and coordinator). That would then be multiplied by the number of coordinators in use.

The time costs are much lower without HMAC of course, but there is still overhead. HMAC wouldn't really be optional here on a multiplayer server, because the fact that only one coordinator can connect at a time is also a security decision, though a pretty low tier one. That can be bypassed by manually modifying code to fake out the supervisor, but that only works if HMAC is disabled. This means most people would be using HMAC, especially with multiple coordinators. The supervisor main loop tries to run every 150ms, and it has a lot more to do besides spending 100ms hashing data for four coordinators.

The pocket computer architecture will be polling data from the coordinator, which would be the only feasible way to have multiple coordinators. I do not want more than one coordinator to connect to the supervisor at a time due to performance limitations, as this is Lua code running in Java. The supervisor already struggles to keep up when multiplayer servers have a few people on and even on singleplayer instances on slower computers when running on big modpacks with lots of stuff going on. Feel free to turn on supervisor debug messages and take a look if you are curious how it's doing in your world. You may see the following:

[Tue Mar 12 23:20:08 2024] [DBG] SVS: offending session: RTU [5] (@15)
[Tue Mar 12 23:20:09 2024] [DBG] SVS: supervisor out queue handler exceeded 100ms queue process limit
[Tue Mar 12 23:20:09 2024] [DBG] SVS: offending session: RTU [5] (@15)
[Tue Mar 12 23:20:25 2024] [DBG] SVS: supervisor out queue handler exceeded 100ms queue process limit
[Tue Mar 12 23:20:25 2024] [DBG] SVS: offending session: RTU [5] (@15)
[Tue Mar 12 23:20:25 2024] [WRN] rtu_session(5): exceeded 100ms queue process limit
[Tue Mar 12 23:20:38 2024] [DBG] SVS: supervisor out queue handler exceeded 100ms queue process limit
[Tue Mar 12 23:20:38 2024] [DBG] SVS: offending session: RTU [5] (@15)
[Tue Mar 12 23:20:44 2024] [DBG] SVS: supervisor out queue handler exceeded 100ms queue process limit
[Tue Mar 12 23:20:44 2024] [DBG] SVS: offending session: RTU [5] (@15)
[Tue Mar 12 23:20:44 2024] [DBG] SVS: supervisor out queue handler exceeded 100ms queue process limit
[Tue Mar 12 23:20:44 2024] [DBG] SVS: offending session: RTU [4] (@14)
[Tue Mar 12 23:20:45 2024] [DBG] SVS: supervisor out queue handler exceeded 100ms queue process limit
[Tue Mar 12 23:20:45 2024] [DBG] SVS: offending session: RTU [4] (@14)
[Tue Mar 12 23:20:49 2024] [DBG] SVS: supervisor out queue handler exceeded 100ms queue process limit
[Tue Mar 12 23:20:49 2024] [DBG] SVS: offending session: RTU [5] (@15)
[Tue Mar 12 23:20:51 2024] [DBG] SVS: supervisor out queue handler exceeded 100ms queue process limit
[Tue Mar 12 23:20:51 2024] [DBG] SVS: offending session: RTU [5] (@15)

As a note, the pocket computer will rely on pulling facility data from the coordinator rather than the supervisor, and only polling for data it needs to show on the page you are currently viewing.

I agree that having Primary/Secondary/Tertiary/Backup servers would be great for realism, and I was actually planning on an Active/Backup supervisor architecture. I dropped that plan because:

This is Lua in Minecraft not a real-world system so there are performance limitations
Maybe 1% of users would use it
It would take months to implement due to the bugs that would arise and race conditions it would introduce

For multiple coordinators, I only ever foresee one being connected to the supervisor at once. Additional ones would need to connect to the already connected coordinator. This would introduce more latency, as the coordinator data only updates every 500ms. The data on the coordinators connected to that coordinator would then be 500ms plus that main 500ms delay from the actual supervisor message, which is then also delayed based on RTUs and PLCs. Additionally:

My biggest concern with overhead here is just because one computer doesn't have to do everything, the CC: Tweaked computer threads still have to on the physical host machine. The CC computers all yield to each other, so if four coordinator computers are doing those hashes and performing the graphics updates, the supervisor computer won't get as much time to run and the whole system will possibly collapse into lag. I have had this happen with 8+ RTUs with 20+ connected mechanism machines. https://github.com/MikaylaFischler/cc-mek-scada/issues/291 was a prior issue due to the RTU previously only accepting one packet at a time, but that symptom can come back when the computers simply can't keep up.

tldr; Running Lua in Java on modded Minecraft servers is slow and more than one coordinator is likely to push the system too far. This is an unlikely feature addition due to that. Part of pocket computer work will be to add diagnostic monitoring, which will help me better track the performance within computers, which could influence this.

[^host]: Dedicated self hosted server, 4 GHz quad-core Xeon processor, one player online + some complex chunks loaded (Mekanism, Create) by a few players.

RangeurGamer commented 6 months ago

Ah yes, in terms of information, there isn't much, it's just a suggestion and I didn't think the system is that complicated.

MikaylaFischler commented 6 months ago

Yeah I'm happy to receive suggestions, and someone had mentioned on the Discord many, many months ago that this would be a nice thing to have. I wanted to explain the complexities for anyone who wanted to know, and if things change in the future this will definitely be a consideration!

jzburda commented 2 months ago

I would want at the very least a way to modemize the speaker system that comes with the coordinator system so authorized operators far off can receive warning sounds alerting a problem to the reactor.

MikaylaFischler commented 2 months ago

@jzburda Just connect the speakers to the RTU, that's already a feature

jzburda commented 2 months ago

Duh, why didn't I try that. Thanks mikayla.

On Wed, Jul 10, 2024, 11:17 AM Mikayla @.***> wrote:

@jzburda https://github.com/jzburda Just connect the speakers to the RTU, that's already a feature

— Reply to this email directly, view it on GitHub https://github.com/MikaylaFischler/cc-mek-scada/issues/456#issuecomment-2220949859, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU5D3DJ3VSPGFTQ6P6ASTLZLVNDNAVCNFSM6AAAAABE4B2622VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRQHE2DSOBVHE . You are receiving this because you were mentioned.Message ID: @.***>

MikaylaFischler / cc-mek-scada

Multiple Coordinators #456