cc-tweaked / CC-Tweaked

Just another ComputerCraft fork
https://tweaked.cc
944 stars 210 forks source link

Server crashes due to presumed OOM issues (no crash log); debug log suggests reporting. #1665

Closed LordDarthDan closed 11 months ago

LordDarthDan commented 11 months ago

Minecraft Version

1.18.x

Version

1.101.2

Details

https://www.dropbox.com/scl/fi/hrcnx9slrsbwegdw7a25z/debug-1-3-.log.gz?rlkey=xx4jbsjgziu11vep7abxnc1o2 (The log is 18 megabytes and won't allow me to upload here.)

The log contains the following: [ComputerCraft-Computer-Worker-1/ERROR] [computercraft/]: Trying to run computer #47 on thread ComputerCraft-Computer-Worker-1, but already running on ComputerCraft-Computer-Worker-0. This is a SERIOUS bug, please report with your debug.log.

The crash often occurs upon a player joining the server. The computers in the report run the following code:

id30: runs https://github.com/zyxkad/cc/blob/master/storage/depot.lua executed via startup code of

while true do
 shell.run('depot')
 sleep(1)
end

id47: runs only

local a = peripheral.wrap('tconstruct:smeltery_1')

while true do
 local l = a.list()
 redstone.setOutput('top', l and #l < a.size())
 sleep(1)
end

The details of the server and modpack are: Hourglass Server Details 31.12.2023.txt (includes Advanced Peripherals, CCTech and Valkyrien Computers)

SquidDev commented 11 months ago

Thanks for the report! Would you be able to change the file to be public on Dropbox - I'm unable to read the logs right now!

LordDarthDan commented 11 months ago

Thanks for the report! Would you be able to change the file to be public on Dropbox - I'm unable to read the logs right now!

Oh, sorry! Dropbox link sharing is a little weird. I think I fixed it tho!

zyxkad commented 11 months ago

I tested, the code that can stable reproduce the error

[08:07:38] [ComputerCraft-Computer-Monitor-0/WARN]: Terminating computer #0 due to timeout (running for 13.629596041000001 seconds). This is NOT a bug, but may mean a computer is misbehaving.
Thread ComputerCraft-Computer-Worker-0 is currently WAITING
  on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@46953484
  at TRANSFORMER/computercraft@1.101.2/dan200.computercraft.core.computer.ComputerExecutor.resumeMachine(ComputerExecutor.java:666)
  at TRANSFORMER/computercraft@1.101.2/dan200.computercraft.core.computer.ComputerExecutor.work(ComputerExecutor.java:628)
  at TRANSFORMER/computercraft@1.101.2/dan200.computercraft.core.computer.ComputerThread$Worker.runImpl(ComputerThread.java:702)
  at TRANSFORMER/computercraft@1.101.2/dan200.computercraft.core.computer.ComputerThread$Worker.run(ComputerThread.java:641)
  at java.base@17.0.1/java.lang.Thread.run(Thread.java:833)
Enqueued command: ABORT
Enqueued events: 0
CobaltLuaMachine is terminated

is

local threads = {}
while true do
  for i = 1, 100000 do
    local thr = coroutine.create(function() end)
    threads[thr] = 1
  end
  -- clear threads
  for k, _ in pairs(threads) do
    threads[k] = nil
  end
  print(os.clock())
  sleep(0) -- yield
end
zyxkad commented 11 months ago

It's a coroutine thread leak issue or maybe table key leak issue, before I run the code, my lowest memory usage is around 8000MB, while game running it will up to 9000MB, after gc will back to 8000MB. After I run the code, the memory usage after gc will continually increase, until it reach 99%, then the error above will throw. @SquidDev

zyxkad commented 11 months ago

I've tested again, if you make the table as weak key setmetatable(threads, {__mode='k'}), then the leak won't happen

SquidDev commented 11 months ago

Thank you for the additional information, that was very helpful!

zyxkad commented 11 months ago

Hey @SquidDev thanks for the quick fix. But did you figure out what's wrong at [ComputerCraft-Computer-Worker-1/ERROR] [computercraft/]: Trying to run computer #47 on thread ComputerCraft-Computer-Worker-1, but already running on ComputerCraft-Computer-Worker-0. This is a SERIOUS bug, please report with your debug.log.? IMO even the table have memory leak issue, the computer should not be run on different thread at same time

SquidDev commented 11 months ago

I didn't no - the relevant code is pretty different on the latest versions of CC:T, and I'm not sure I can face going back and looking at the older version again.

I have a suspicion that the original worker (ComputerCraft-Computer-Worker-0) died/killed without cleaning up properly (possibly due to the OOM), and then a new worker was spawned the next time we came to run a task.

LordDarthDan commented 10 months ago

@SquidDev Sorry to bother, but will this fix be releasing any time soon? Or should I try to build this version of the mod myself?

SquidDev commented 10 months ago

There has been a fix released for Minecraft 1.20.1 (CC:T 1.109.3). I'm afraid I'm no longer providing updates for older versions of Minecraft - you might be able to backport the fixes, but it will be a bit of a slog.

LordDarthDan commented 10 months ago

There has been a fix released for Minecraft 1.20.1 (CC:T 1.109.3). I'm afraid I'm no longer providing updates for older versions of Minecraft - you might be able to backport the fixes, but it will be a bit of a slog.

Judging by the specific fix, it was addressed in Cobalt, which isn't version dependent, as far as I understand - not the actual mod. The problem was discovered in 1.18.2 and had been seriously killing my server. I would assume it is possible to just build 1.18.2 with the new Cobalt implementation - unless, of course, there's been some breaking changes I do not know of.

SquidDev commented 10 months ago

The 1.18.x branch of CC:T is still on Cobalt 0.6 - there's been several breaking changes since then (notably the rewrite of coroutines, and the update to Lua 5.2). You're probably better off cherry-picking the fix to the older version of Cobalt.

LordDarthDan commented 10 months ago

The 1.18.x branch of CC:T is still on Cobalt 0.6 - there's been several breaking changes since then (notably the rewrite of coroutines, and the update to Lua 5.2). You're probably better off cherry-picking the fix to the older version of Cobalt.

Thank you. I will try to do exactly that.

zyxkad commented 10 months ago

If anyone else have met on this problem and want a backported fix: https://github.com/zyxkad/CC-Tweaked/releases/tag/v1.18.2-1.101.3%2B1 PS: this update should only required on server side