bastibe / lunatic-python

A two-way bridge between Python and Lua
http://labix.org/lunatic-python
GNU Lesser General Public License v2.1
303 stars 83 forks source link

Modified function error handling after calling python function in py_… #87

Closed sudheerhebbale closed 4 months ago

sudheerhebbale commented 4 months ago

Modified function error handling after calling python function in py_object_call. Captured any error string from the called function and added it to the lua error being thrown.

Tested the code with test_py.lua in tests directory.

Also tested with a test case for the specific change

eg.py

def throw_exception():
    raise Exception("OOPS SORRY!!")

def print_hello():
    return "Hello from eg.py"

a.lua

local python = require('python');

python = require("python")
local eg = python.import("eg")
local status, s = pcall(eg.print_hello)
print(status, s)
local status, s = pcall(eg.throw_exception)
print(status, s)

Call lua file

$ lua a.lua
true    Hello from eg.py
false   error calling python function [OOPS SORRY!!]
bastibe commented 4 months ago

This looks good to me!

But I currently don't have time to test it thoroughly. Did you run both test suits to check that this doesn't break anything?

sudheerhebbale commented 4 months ago

This looks good to me!

But I currently don't have time to test it thoroughly. Did you run both test suits to check that this doesn't break anything?

bastibe commented 4 months ago

Thank you!

Since you already debugged it, could you add the #define PY_SSIZE_T_CLEAN?

sudheerhebbale commented 4 months ago

Core dump still persists after adding #define on ubuntu.

Trying to figure that out, if there are any pointers here, they will help.

Below is the gdb stack trace

Starting program: /usr/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lua
>>> lua.execute("a = 3")

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff74af567 in luaZ_fill (z=0x7fffffffd8a0) at /build/lua5.3-5AFsFS/lua5.3-5.3.6/src/lzio.c:34
34  /build/lua5.3-5AFsFS/lua5.3-5.3.6/src/lzio.c: No such file or directory.
(gdb) where
#0  0x00007ffff74af567 in luaZ_fill (z=0x7fffffffd8a0) at /build/lua5.3-5AFsFS/lua5.3-5.3.6/src/lzio.c:34
#1  0x00007ffff74a3fa5 in f_parser (L=L@entry=0x555555c786e8, ud=ud@entry=0x7fffffffd8d0)
    at /build/lua5.3-5AFsFS/lua5.3-5.3.6/src/ldo.c:769
#2  0x00007ffff749b747 in luaD_rawrunprotected (L=L@entry=0x555555c786e8, f=f@entry=0x7ffff74a3aa0 <f_parser>, 
    ud=ud@entry=0x7fffffffd8d0) at /build/lua5.3-5AFsFS/lua5.3-5.3.6/src/ldo.c:142
#3  0x00007ffff749dfdb in luaD_pcall (L=L@entry=0x555555c786e8, func=func@entry=0x7ffff74a3aa0 <f_parser>, u=u@entry=0x7fffffffd8d0, 
    old_top=16, ef=<optimized out>) at /build/lua5.3-5AFsFS/lua5.3-5.3.6/src/ldo.c:729
#4  0x00007ffff749e11a in luaD_protectedparser (mode=<optimized out>, name=<optimized out>, z=0x7fffffffd8a0, L=0x555555c786e8)
    at /build/lua5.3-5AFsFS/lua5.3-5.3.6/src/ldo.c:793
#5  lua_load (L=0x555555c786e8, reader=<optimized out>, data=<optimized out>, chunkname=<optimized out>, mode=<optimized out>)
    at /build/lua5.3-5AFsFS/lua5.3-5.3.6/src/lapi.c:999
#6  0x00007ffff74b5250 in luaL_loadbufferx (L=<optimized out>, buff=<optimized out>, size=<optimized out>, name=<optimized out>, 
    mode=<optimized out>) at /build/lua5.3-5AFsFS/lua5.3-5.3.6/src/lauxlib.c:760
#7  0x00007ffff7b08cee in Lua_run () from /home/ubuntu/open-source/original/lunatic-python/build/bin/lua.so
#8  0x00005555556ae138 in ?? ()
#9  0x00005555556a4a7b in _PyObject_MakeTpCall ()
#10 0x000055555569d629 in _PyEval_EvalFrameDefault ()
#11 0x00005555556939c6 in ?? ()
#12 0x0000555555789256 in PyEval_EvalCode ()
#13 0x00005555557b4108 in ?? ()
#14 0x00005555557ad9cb in ?? ()
#15 0x00005555556083b4 in ?? ()
#16 0x0000555555607ef3 in _PyRun_InteractiveLoopObject ()
#17 0x00005555557b2fa8 in _PyRun_AnyFileObject ()
#18 0x00005555555f3d54 in PyRun_AnyFileExFlags ()
#19 0x00005555555e95de in ?? ()
#20 0x000055555577c02d in Py_BytesMain ()
#21 0x00007ffff7c29d90 in __libc_start_call_main (main=main@entry=0x55555577bff0, argc=argc@entry=1, argv=argv@entry=0x7fffffffe088)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#22 0x00007ffff7c29e40 in __libc_start_main_impl (main=0x55555577bff0, argc=1, argv=0x7fffffffe088, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe078) at ../csu/libc-start.c:392
#23 0x000055555577bf25 in _start ()
(gdb) 
greatwolf commented 4 months ago

I had some time to take a look at the segfault issue. After some lengthy debugging the issue seems to stem from the change to using Py_ssize_t while some some of lunatic's code still assumed int.

The segfault was happening on this line https://github.com/bastibe/lunatic-python/blob/master/src/luainpython.c#L491

But the actual root cause is a couple lines above that, https://github.com/bastibe/lunatic-python/blob/master/src/luainpython.c#L478.

Changing it from:

// int len;

if (!PyArg_ParseTuple(args, "s#", &s, &len))

to:

Py_ssize_t len;

if (!PyArg_ParseTuple(args, "s#", &s, &len))

appears to fix the core dumping issue.

So basically, if PY_SSIZE_T_CLEAN is defined as suggested PyArg_ParseTuple(args, "s#", &s, &len) is going to give back const char * and Py_ssize_t. In this case len needs to be a Py_ssize_t, not an int. There could be still be other places in the code that needs this fix -- I haven't checked but at least we now know what's up.

Tested this on Lua 5.4.4 and Python 3.8.0.

The fix will probably look something like:

#ifdef PY_SSIZE_T_CLEAN
  Py_ssize_t len;
#else
  int len;
#endif

Couple of quick ideas for future debugging:

You can set cmake to do a debug build by changing https://github.com/bastibe/lunatic-python/blob/master/CMakeLists.txt#L3 to set(CMAKE_BUILD_TYPE_INIT "Debug"). Alternatively passing -DCMAKE_BUILD_TYPE=Debug to cmake should also work. eg.

cmake -DCMAKE_BUILD_TYPE=Debug -B ./build
cmake --build ./build -v

this will help make gdb's backtrace much more comprehensible.

sudheerhebbale commented 4 months ago

Made the changes taking care of (I think) impact due to #define PY_SSIZE_T_CLEAN

Also changed initialization of PyTypeObject LuaObject_Type to look like below. This is because one of the members tp_print has been renamed in Python 3.8 to tp_vectorcall_offset. I think the below will be more explicit than the earlier way to initialize.

PyTypeObject LuaObject_Type = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "lua.custom",
    .tp_basicsize = sizeof(LuaObject),
    .tp_dealloc = (destructor)LuaObject_dealloc,
    .tp_repr = LuaObject_str,
    .tp_as_mapping = &LuaObject_as_mapping,
    .tp_call = (ternaryfunc)LuaObject_call,
    .tp_str = LuaObject_str,
    .tp_getattro = LuaObject_getattr,
    .tp_setattro = LuaObject_setattr,
    .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
    .tp_doc = "custom lua object",
    .tp_richcompare = LuaObject_richcmp,
    .tp_iter = PyObject_SelfIter,
    .tp_iternext = (iternextfunc)LuaObject_iternext,
    .tp_alloc = PyType_GenericAlloc,
    .tp_new = PyType_GenericNew,
    .tp_free = PyObject_Del,
};

One issue has emerged running of test_lua.py results in an error.

$ python test_lua.py
free(): invalid pointer
Aborted (core dumped)

Have been trying to debug this for the past few hours.

Tried (unsuccessfully) to use clang and addresss/memory sanitizers, because the whole executable is not linked with correct flags and running of python is not becoming possible.

This is occurring with very simple python file, named uu.py as well

import lua

lg = lua.globals()
lg == lg._G

The gdb stacktrace (even with CMAKE flags set is not useful) since the error is coming from within Python binary and not lua.so Suspicion: Memory is getting freed by twice (lua and python libraries)

Reading symbols from python...
(No debugging symbols found in python)
(gdb) run uu.py 
Starting program: /usr/bin/python uu.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
free(): invalid pointer

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352557184) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352557184) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352557184) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352557184, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7c89676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7ddbb77 "%s\n")
    at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7ca0cfc in malloc_printerr (str=str@entry=0x7ffff7dd9744 "free(): invalid pointer") at ./malloc/malloc.c:5664
#7  0x00007ffff7ca2a44 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at ./malloc/malloc.c:4439
#8  0x00007ffff7ca5453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x00005555556b1613 in ?? ()
#10 0x0000555555689c52 in ?? ()
#11 0x00005555556b10ec in ?? ()
#12 0x00005555556abdab in ?? ()
#13 0x000055555571ea87 in ?? ()
#14 0x000055555567debc in ?? ()
#15 0x00005555557b8970 in ?? ()
#16 0x00005555557c04e3 in ?? ()
#17 0x00005555557bda89 in ?? ()
#18 0x00005555557b4537 in Py_FinalizeEx ()
#19 0x00005555557a5913 in Py_RunMain ()
#20 0x000055555577c02d in Py_BytesMain ()
#21 0x00007ffff7c29d90 in __libc_start_call_main (main=main@entry=0x55555577bff0, argc=argc@entry=2, argv=argv@entry=0x7fffffffe098)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#22 0x00007ffff7c29e40 in __libc_start_main_impl (main=0x55555577bff0, argc=2, argv=0x7fffffffe098, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe088) at ../csu/libc-start.c:392
#23 0x000055555577bf25 in _start ()

Any thoughts/pointers?

sudheerhebbale commented 4 months ago

All the changes to source code hitherto are committed

sudheerhebbale commented 4 months ago

Found the issue after some debugging.

It is in the function LuaObject_richcmp. The function is returning a boolean value of python (Py_True or Py_False) without incrementing reference, thus causing the error at exit.

Have pushed the fix in the latest commit.

sudheerhebbale commented 4 months ago

Now both the test cases pass smootly

bastibe commented 4 months ago

Good changes!

Using the designated initializers is a great improvement! We can get rid of the comment containing the old version, too, if you ask me.

The function is returning a boolean value of python (Py_True or Py_False) without incrementing reference, thus causing the error at exit.

I remember making that error myself a few months ago, causing more than a few headaches. These things are hard to debug, as they tend to break far away from the implementation.

sudheerhebbale commented 4 months ago

We can get rid of the comment containing the old version, too, if you ask me.

Have removed the comments.

greatwolf commented 4 months ago

Good changes!

Using the designated initializers is a great improvement! We can get rid of the comment containing the old version, too, if you ask me.

The function is returning a boolean value of python (Py_True or Py_False) without incrementing reference, thus causing the error at exit.

I remember making that error myself a few months ago, causing more than a few headaches. These things are hard to debug, as they tend to break far away from the implementation.

yea, sometimes I wish we could use some kind of RAII mechanism like in C++. It would really clean the code up as well as being less error-prone.

sudheerhebbale commented 4 months ago

I suppose the pull request can be merged if there are no more comments.

bastibe commented 4 months ago

Looks good from my end

greatwolf commented 4 months ago

I suppose the pull request can be merged if there are no more comments.

It looks good enough for the most part. Some slight code-smell in pulling the exception information from python but we can refactor that later.

Something missing is maybe add a simple test in either test_lua.py or test_py.lua so future inquiring minds know what kind of error messages is expected when an exception is raised from the python side.

sudheerhebbale commented 4 months ago

Something missing is maybe add a simple test in either test_lua.py or test_py.lua so future inquiring minds know what kind of error messages is expected when an exception is raised from the python side.

This is complete now