MikaylaFischler / cc-mek-scada

Configurable ComputerCraft SCADA system for multi-reactor control of Mekanism fission reactors with a GUI, automatic safety features, waste processing control, and more! Please be sure to take a look at the Wiki tab, this project has lots of docs!
https://www.youtube.com/@cc-mek-scada
MIT License
336 stars 13 forks source link

Supervisor crash - arithmetic operation on nil value #502

Closed imevul closed 3 months ago

imevul commented 3 months ago

Describe the bug Supervisor crashed with error:

fatal error occured in main application:
/supervisor/unit.lua:283: attempt to perform arithmetic on local 'value' (a nil value)

To Reproduce No idea. It worked flawlessly for days before the crash. Standard setup according to the wiki, including a Coordinator, Supervisor, 1 RTU, 1 PLC, 1 Fission Reactor, 1 Turbine, 1 Dynamic Tank, 1 Induction Matrix, 1 SNA, 1 Environment Detector, a couple of speakers. No bundled redstone wire, so no automatic waste processing. Reactor set to burn at 2mb/t, primary prio group, controlled by Coordinator All relevant chunks are chunk loaded No auth key set, standard channels.

Expected behavior I expected the program not to crash.

Screenshots and Logs

[Tue Jun 18 18:22:04 2024] [INF] ========================================
[Tue Jun 18 18:22:04 2024] [INF] BOOTING supervisor.startup v1.3.12
[Tue Jun 18 18:22:04 2024] [INF] ========================================
[Tue Jun 18 18:22:04 2024] [INF] PPM: found a modem (right)
[Tue Jun 18 18:22:06 2024] [INF] PLC_ESTABLISH: PLC (v1.7.11) [@20] reactor unit 1 PLC connected with session ID 0
[Tue Jun 18 18:22:07 2024] [INF] plc_session(0): computed maximum operational temperatures 1781.926K (H2O) and 1077.538K (Na)
[Tue Jun 18 18:22:07 2024] [INF] CRD_ESTABLISH: coordinator (v1.4.7) [@23] connected with session ID 0
[Tue Jun 18 18:22:08 2024] [INF] RTU_ESTABLISH: RTU (v1.9.6) [@21] connected with session ID 0
[Tue Jun 18 18:22:08 2024] [INF] FAC: induction matrix reconnected, clearing ASCRAM condition
[Tue Jun 18 18:26:13 2024] [INF] FAC: MAX_BURN process mode started
[Tue Jun 18 18:27:06 2024] [WRN] plc_session(0): PLC KEEP_ALIVE round trip time > 750ms (770ms)
[Tue Jun 18 19:04:36 2024] [INF] UNIT 1 ALARM 10 (RPSTransient): TRIPPED [PRIORITY TIMELY]
[Tue Jun 18 19:05:36 2024] [INF] FAC: disengaging auto control (now inactive)
[Tue Jun 18 19:05:50 2024] [INF] FAC: MAX_BURN process mode started
[Tue Jun 18 19:06:29 2024] [INF] FAC: disengaging auto control (now inactive)
[Tue Jun 18 19:06:42 2024] [INF] FAC: MAX_BURN process mode started
[Tue Jun 18 19:08:58 2024] [ERR] plc_session(0): failed to parse status packet data
[Tue Jun 18 19:08:58 2024] [INF] plc_session(0): computed maximum operational temperatures 1781.926K (H2O) and 1077.538K (Na)
[Tue Jun 18 19:08:58 2024] [INF] =====> FATAL SOFTWARE FAULT <=====
[Tue Jun 18 19:08:58 2024] [FTL] /supervisor/unit.lua:283: attempt to perform arithmetic on local 'value' (a nil value)
[Tue Jun 18 19:08:58 2024] [INF] ----------------------------------
[Tue Jun 18 19:08:58 2024] [INF] RUNTIME:          ComputerCraft 1.111.0 (Minecraft 1.20.1)
[Tue Jun 18 19:08:58 2024] [INF] LUA VERSION:      Lua 5.2
[Tue Jun 18 19:08:58 2024] [INF] APPLICATION:      supervisor
[Tue Jun 18 19:08:58 2024] [INF] FIRMWARE VERSION: v1.3.12
[Tue Jun 18 19:08:58 2024] [INF] COMMS VERSION:    2.5.2
[Tue Jun 18 19:08:58 2024] [INF] GRAPHICS VERSION: 2.3.0
[Tue Jun 18 19:08:58 2024] [INF] LOCKBOX VERSION:  1.1
[Tue Jun 18 19:08:58 2024] [INF] ----------------------------------
[Tue Jun 18 19:08:58 2024] [INF] --- begin debug trace ---
stack traceback:
    ...scada-common/crash.lua:51: in metamethod '__sub'
    /supervisor/unit.lua:283: in upvalue '_compute_dt'
    /supervisor/unit.lua:314: in upvalue '_dt__compute_all'
    /supervisor/unit.lua:557: in field 'update'
    ...upervisor/facility.lua:1007: in field 'update_units'
    ...session/svsessions.lua:464: in field 'iterate_all'
    ...supervisor/startup.lua:202: in function <...supervisor/startup.lua:95>
    [C]: in function 'xpcall'
    ...supervisor/startup.lua:240: in main chunk
[Tue Jun 18 19:08:58 2024] [INF] --- end debug trace ---

Additional context This supervisor crash caused the coordinator to stop as well since it lost connection. It did not crash, but it exited the program. If it makes a difference, critical meltdowns are disabled in the config.

MikaylaFischler commented 3 months ago

plc_session(0): failed to parse status packet data is something I only expect to see when I'm actually working on the code and changing the comms stuff. It must have crashed due to trying to use that nil data to update the delta's. Please share that PLC's logs too

imevul commented 3 months ago

log.txt

PLC log in full. It never crashed, but I had to exit to get the log. Just so that there's no confusion about that.

MikaylaFischler commented 3 months ago

Thanks, and yeah I can see that in the logs.

The computed maximum operational temperatures 1781.926K (H2O) and 1077.538K (Na) getting emitted in the supervisor logs a second time means the PLC re-sent the structure packet, and given the warning above it, something definitely went wrong on the PLC when interfacing with the Mekanism CC integrations. The actual crash just was the first time it tried to use nil values.

[Tue Jun 18 19:08:58 2024] [ERR] plc_session(0): failed to parse status packet data
[Tue Jun 18 19:08:58 2024] [INF] plc_session(0): computed maximum operational temperatures 1781.926K (H2O) and 1077.538K (Na)
[Tue Jun 18 19:08:58 2024] [INF] =====> FATAL SOFTWARE FAULT <=====

from the supervisor aligns perfectly with the following from the PLC, which matches my suspicions

[Tue Jun 18 19:08:58 2024] [WRN] PPM: lost device fissionReactorLogicAdapter mounted to back
[Tue Jun 18 19:08:58 2024] [ERR] reactor logic adapter disconnected
[Tue Jun 18 19:08:58 2024] [INF] PPM: mount(back) -> found a fissionReactorLogicAdapter
[Tue Jun 18 19:08:58 2024] [INF] reactor reconnected
[Tue Jun 18 19:08:58 2024] [INF] RPS: partial reset on formed
[Tue Jun 18 19:08:58 2024] [WRN] RPS: low coolant
[Tue Jun 18 19:08:58 2024] [INF] RPS: reactor SCRAM
[Tue Jun 18 19:08:58 2024] [ERR] PPM: [@back] protected scram() -> Scram requires the reactor to be active.
[Tue Jun 18 19:08:58 2024] [INF] RPS: emergency coolant valve OPENED
[Tue Jun 18 19:08:58 2024] [INF] RPS: reactor SCRAM
[Tue Jun 18 19:09:02 2024] [WRN] server timeout

the PLC lost the reactor for some reason, were you working near it and broke a modem cable or made the reactor re-form? Either way, nil data should not have been able to make it out of the PLC. I think a semi-recent change I made to my peripheral interface may have prevented a peripheral fault from blocking data being sent. I would have expected that to be in the logs though. I'll play around with it and see if I can reproduce it, thanks for the report!

imevul commented 3 months ago

Haven't done anything to the reactor structure itself today, or the relevant cables. I have been disconnected once or twice though, one time because of a mekanism error, so might be related to that. Could be that there's some instability/bugs in the mekanism version I'm using.

Mekanism 10.4.5

MikaylaFischler commented 3 months ago

Alright, could be. I think then it might be a race condition with my code vs Mekansim, which may be tough for me to reproduce, but either way I think I can fix it with a small change. Similar thing fixed some issues with the RTU peripherals. I had re-worked how it all behaved to be more robust, and this part of the code didn't get touched, which was an oversight.

MikaylaFischler commented 3 months ago

Additional logs from another user: https://pastebin.com/wjvKGLux

[Wed Jun 26 11:02:32 2024] [INF] ========================================
[Wed Jun 26 11:02:32 2024] [INF] BOOTING supervisor.startup v1.3.12
[Wed Jun 26 11:02:32 2024] [INF] ========================================
[Wed Jun 26 11:02:32 2024] [INF] PPM: found a computer (bottom)
[Wed Jun 26 11:02:32 2024] [INF] PPM: found a speaker (top)
[Wed Jun 26 11:02:32 2024] [INF] PPM: found a modem (back)
[Wed Jun 26 11:02:32 2024] [INF] PPM: found a monitor (right)
[Wed Jun 26 11:02:32 2024] [INF] PPM: found a monitor (left)
[Wed Jun 26 11:02:37 2024] [INF] CRD_ESTABLISH: coordinator (v1.4.7) [@4] connected with session ID 0
[Wed Jun 26 11:02:37 2024] [INF] PLC_ESTABLISH: PLC (v1.7.11) [@5] reactor unit 1 PLC connected with session ID 0
[Wed Jun 26 11:02:37 2024] [INF] plc_session(0): computed maximum operational temperatures 2489.255K (H2O) and 1431.202K (Na)
[Wed Jun 26 11:02:40 2024] [INF] UNIT 1 ALARM 10 (RPSTransient): TRIPPED [PRIORITY TIMELY]
[Wed Jun 26 11:02:42 2024] [INF] UNIT 1 ALARM 11 (RCSTransient): TRIPPED [PRIORITY TIMELY]
[Wed Jun 26 11:17:36 2024] [WRN] plc_session(0): PLC KEEP_ALIVE round trip time > 750ms (868ms)
[Wed Jun 26 19:46:18 2024] [WRN] crdn_session(0): COORD KEEP_ALIVE round trip time > 750ms (1582ms)
[Wed Jun 26 19:46:18 2024] [WRN] plc_session(0): PLC KEEP_ALIVE round trip time > 750ms (1582ms)
[Wed Jun 26 19:51:52 2024] [WRN] plc_session(0): PLC KEEP_ALIVE round trip time > 750ms (999ms)
[Wed Jun 26 21:09:21 2024] [INF] plc_session(0): computed maximum operational temperatures 2489.255K (H2O) and 1431.202K (Na)
[Wed Jun 26 21:09:27 2024] [INF] plc_session(0): computed maximum operational temperatures 2489.255K (H2O) and 1431.202K (Na)
[Wed Jun 26 21:09:27 2024] [INF] plc_session(0): computed maximum operational temperatures 2489.255K (H2O) and 1431.202K (Na)
[Wed Jun 26 21:09:28 2024] [INF] plc_session(0): computed maximum operational temperatures 2489.255K (H2O) and 1431.202K (Na)
[Wed Jun 26 21:09:29 2024] [ERR] plc_session(0): failed to parse status packet data
[Wed Jun 26 21:09:29 2024] [INF] plc_session(0): computed maximum operational temperatures 2489.255K (H2O) and 1431.202K (Na)
[Wed Jun 26 21:09:29 2024] [INF] =====> FATAL SOFTWARE FAULT <=====
[Wed Jun 26 21:09:29 2024] [FTL] /supervisor/unit.lua:283: attempt to perform arithmetic on local 'value' (a nil value)
[Wed Jun 26 21:09:29 2024] [INF] ----------------------------------
[Wed Jun 26 21:09:29 2024] [INF] RUNTIME:          ComputerCraft 1.111.0 (Minecraft 1.20.1)
[Wed Jun 26 21:09:29 2024] [INF] LUA VERSION:      Lua 5.2
[Wed Jun 26 21:09:29 2024] [INF] APPLICATION:      supervisor
[Wed Jun 26 21:09:29 2024] [INF] FIRMWARE VERSION: v1.3.12
[Wed Jun 26 21:09:29 2024] [INF] COMMS VERSION:    2.5.2
[Wed Jun 26 21:09:29 2024] [INF] GRAPHICS VERSION: 2.3.0
[Wed Jun 26 21:09:29 2024] [INF] LOCKBOX VERSION:  1.1
[Wed Jun 26 21:09:29 2024] [INF] ----------------------------------
[Wed Jun 26 21:09:29 2024] [INF] --- begin debug trace ---
stack traceback:
    ...scada-common/crash.lua:51: in metamethod '__sub'
    /supervisor/unit.lua:283: in upvalue '_compute_dt'
    /supervisor/unit.lua:314: in upvalue '_dt__compute_all'
    /supervisor/unit.lua:557: in field 'update'
    ...upervisor/facility.lua:1007: in field 'update_units'
    ...session/svsessions.lua:464: in field 'iterate_all'
    ...supervisor/startup.lua:202: in function <...supervisor/startup.lua:95>
    [C]: in function 'xpcall'
    ...supervisor/startup.lua:240: in main chunk
[Wed Jun 26 21:09:29 2024] [INF] --- end debug trace ---