Spurious "attempt to yield across a C-call boundary"

Feuermurmel commented 4 hours ago

Describe the bug

I'm running a larger application and sometimes I'm getting this error, almost at random:

attempt to yield across a C-call boundary

To Reproduce

I tried reducing it to a simple reproducing example: eeprom.txt

Pasting the file into the EEPROM of a computer and running it leads to the following error:

EEPROM:666: attempt to yield across a C-call boundary
stack traceback:
    EEPROM:666: in function <EEPROM:665>
    [C]: in function 'string.gsub'
    EEPROM:662: in main chunk

The file contains a bunch of calls to tostring() and more calls inside a string.gsub(). It seems that all these calls are necessary to trigger the bug, even in a simple example. But in my larger application calls like this happen all over the place so it seems reasonable that the bug is only triggered after some time.

Since there is no call to coroutine.yield() in the example, I don't really understand what's happening here.

Additional context

I'm using FicsIt Networks 0.3.27 and Satisfactory 1.0.0.6.

SMMDebug-2024-11-15-14-57-36.zip

Workaround

In case anyone is interested, this is the workaround I'm using. This only covers string.gsub(), but this is the only native function that runs Lua code that I'm calling in some places:

local origStringGsub = string.gsub

function string.gsub(s, pattern, repl, n)
    if type(repl) == "function" then
        computer.skip()
    end

    return origStringGsub(s, pattern, repl, n)
end

Panakotta00 commented 3 hours ago

The cause is the gsub. Due to technical safety reasons, Lua code will automatically yield after 2500 instructions. The problem is that the gsub executes the provided replacement method in an non yieldable way. And the automatic yield will cause this error.

A possible workaround would be to do a computer.skip() right before the gsub. But even then, if you replace too many things, or your replacement method is fairly long, you still can encounter this issue.

The reason we have to auto yield is to prevent while true do end loops freezing your game.

Feuermurmel commented 2 hours ago

Ah okay. Thanks for the explanation.

Have you considered moving the execution of the Lua code to a separate thread? Then you could have the thread wait on some synchronization primitive (like a lock, semaphore, signal etc.) until the next tick instead of yielding the current coroutine. Yielding would then only be necessary when the thread needs to be stopped, e.g. when stopping the computer or loading another save.

Panakotta00 commented 1 hour ago

You can already let it execute on a separate thread using the computer.promote() function.

But the problem still persists. Mostly in the case of saving your game state. We can only do this once all execution yielded. And having users write "aware" code instead of hoping they never trip the issue is an better approach imo.

Tho in the case of promoted tick state we can think about increasing the lua instruction limit and/or replace it with a timeout instead.

Feuermurmel commented 57 minutes ago

Ahh I see, I'd also confidently guess that persisting the Lua VMs state is not possible while any coroutines are running (at least when they have C frames on the stack). So at some point, they need to yield in that scenario. I would have simply assumed that the VM state is not persisted between saves (I was used to this from ComputerCraft). Implementing persistent runtime state for the computers is a noble goal. 😊

I have two more suggestions:

When running into the situation that a VM has used up its 2500 instructions, maybe the implementation could see if there's a C frame on the stack and then wait for a little bit more instead of yielding immediately.

If the implementation e.g. gave the VM another 2500-instruction maximum, then the code would at least have a guarantee for how long it can stay inside a call of e.g. string.gsub(). Right now, if some piece of code is running for a long time without yielding, sometimes the limit could be reached in a safe place. Other times in a place where yielding is not possible, making the crashes random and hard to debug (as it was in my case).
Add a descriptive error message when this is happening, so that it's easier to debug. I f.e., was completely convinced that I was hiding a bug in FIN until your explanation here, so I spent quite some time trying to narrow it down.

Panakotta00 / FicsIt-Networks

Spurious "attempt to yield across a C-call boundary" #357