Dinhero21 / game-engine

Fully Open Source Game and Game Engine
0 stars 0 forks source link

anti-catastrophic-lag-watchdog #12

Open Dinhero21 opened 11 months ago

Dinhero21 commented 11 months ago

The idea is to have a program running in another thread (via node:child_process or node:worker_threads) that will monitor the main process.

The main process will send data to the watchdog to signal its state (something like alive every second).

If the server does not send any data for a long amount of time (probably a minute) the watchdog will initialize the data loss mitigation protocol.

Data Loss Mitigation Protocol

The watchdog could simply kill the main process but that would destroy progress and rollback the server.

My idea is for the watchdog to have a node:inspector instance inspecting the main process.

Upon noticing the catastrophic lag the watchdog is going to send a signal to the server telling it to disconnect all clients, save the world, and shut down (to mitigate data loss). This will, however, not happen as the server is currently stuck in a loop.

I have many ideas on how to get out of the loop programmatically, some dead simple and some overly complex, I will document some of them here:

A problem with all of these anti-loop solutions is that they create invalid states (ex. a function was supposed to return a string but because it prematurely exited the loop it returned undefined), this might cause data corruption.

To mitigate this the last world should be backed up and upon server start the latest world should be attempted to load, if corrupted, load the backup.

A better solution might be to have a loose data parser which when encountering unexpected results would try its best to not crash.

Dinhero21 commented 11 months ago

Runtime.terminateExecution seems like a viable way of doing idea 1

Dinhero21 commented 11 months ago

After a lot of searching, I finally found this which allows you to debug node remotely.