Closed ewmailing closed 1 year ago
This assert you mention in keeper.c is something that should never happen. It means that somehow the stack manipulation logic of the function is flawed, and the stack contents at the end of the function are not the same as what we started with. This is definitely not the case in the function itself, so the cause must be external. A wrong multithread access issue comes to mind. However, I've checked the code, as far as I can tell all accesses to the keeper state associated to a given linda are properly protected by the keeper's mutex, so I'm somewhat baffled.
Have you tried running your tests against a debug Pallene build? Maybe there are internal checks that you can activate, such as LUA_USE_API_CHECK or similar?
Thank you for the response. I still don't know for sure if LuaLanes is the problem or not. You are right that it could be Pallene or something else. (I'm currently also deeply scrutinizing a C-module I'm using for possible thread-unsafe things. Although I can't explain why I haven't seen it crash under stock Lua 5.4, except for bad luck.)
FYI, last year we did use LUAI_ASSERT and caught a pretty devious Pallene bug with its code-generator that only happened in certain edge cases. But right now, it isn't catching anything.
Anyway, I appreciate you checking the code.
I sometimes experience segmentation faults when using LuaLanes. My suspicion is that my problem is that I'm using the in-development work of Pallene, which is using a slightly modified version of Lua 5.4. https://github.com/pallene-lang/pallene https://www.youtube.com/watch?v=pGF2UFG7n6Y
Pallene has the ability to compile Pallene to regular Lua scripts (for testing/debugging/benchmarking), so as an experiment, I did this, and I still encountered crashes sometimes. So I believe it is not the generated Pallene/native code side of things that is the problem, but perhaps simply some code changes they made in the Lua interpreter code base.
To try to further confirm this, installed the regular Lua 5.4.4 and ran my program against that. I was not able to produce any crashes so far. (But as I said, the crashes are only sometimes, so it is hard to be sure.)
My code that uses Lanes is pretty basic. I don't use Lindas. My problem set is embarrassingly parallel. So use Lanes to run each thing in parallel. The results of each lane are saved into a common array for later result processing in serial at the end.
Also, I noticed that if I add calls to lanes.sleep, the longer the duration, the chances of a crash goes down.
Also, exactly only once, I did get a termination that printed this error message: lua: src/keeper.c:251: keepercall_clear: Assertion `FALSE' failed.
I used LuaRocks to install Lanes, version 3.16.0-0. I am running Linux Ubuntu 22.04.1 LTS.
For some context, here is an excerpt of my main LuaLanes body/loop:
I also manually ran the tests I found in the Lanes repo against both versions of Lua that I had.
for f in ./*.lua; do lua5.4 "$f" >> testrun54.txt 2>&1; done
for f in ./*.lua; do lua "$f" >> testrunpal.txt 2>&1; done
I visually compared the outputs with a diff tool, and didn't see anything that stood out (or any seg faults). But I'm attaching the results of both runs. testrun54.txt testrunpal.txtDo you have any recommendations on how I can help isolate the problem so it can be fixed? Since my only synchronization point is where I look for when a Lane is finished, and put the results into an array, I'm hoping maybe there are some obvious places to look that need some kind of lock or something in the C code base, perhaps because some Lua internal got moved that Lanes was expecting.
Thank you