dcwatson / deflate

Python extension wrapper for libdeflate.
MIT License
25 stars 6 forks source link

segmentation fault using deflate_decompress #41

Closed rlunaro closed 2 months ago

rlunaro commented 5 months ago

I've got a segmentation fault using the function deflate_decompress. Here is the test used:

(.env) @:~/wkpy/test$ python 
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import base64
>>> import deflate
>>> tt1 = "Sy1LzNFQt7dT10uvKs1Lzs8tKEotLtZIr8rMS8tJLEnVSEosTjUziU9JT\x635PSdUoLikqSi3TU\x43kuKTHQ\x42\x41Fr\x41\x41\x3d\x3d"
>>> tt1_decoded = base64.b64decode( tt1 )
>>> deflate.gzip_decompress( tt1_decoded )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
deflate.DeflateError: Invalid gzip data.
>>> deflate.deflate_decompress( tt1_decoded )
Segmentation fault (core dumped)

I was trying to decompress a file found in a php file, apparently compressed with the gzdeflate() function.

mxmlnkn commented 2 months ago

I have the same problem with zlib_decompress. Using an older option also is not an option because deflate_decompress and zlib_decompress seem to be newly added in 0.5.0, the most recent version.

Can be reproduced very easily with:

python3 -c 'import zlib, deflate; deflate.zlib_decompress(zlib.compress(b"a"))'

Backtrace:

#0  deflate_zlib_decompress (self=<optimized out>, args=<optimized out>) at deflate.c:283
#1  0x00000000005127a0 in ?? ()
#2  0x00000000004e0bab in _PyObject_MakeTpCall ()
#3  0x00000000004f627a in _PyEval_EvalFrameDefault ()
#4  0x00000000005d99bf in PyEval_EvalCode ()
#5  0x00000000005f7fc7 in ?? ()
#6  0x00000000005f49c3 in ?? ()
#7  0x00000000005e9371 in PyRun_StringFlags ()
#8  0x00000000005e924a in PyRun_SimpleStringFlags ()
#9  0x0000000000606a05 in Py_RunMain ()
#10 0x00000000005cb9db in Py_BytesMain ()
#11 0x00007ffff7c28150 in __libc_start_call_main (main=main@entry=0x5cb940, argc=argc@entry=3, argv=argv@entry=0x7fffffffda18) at ../sysdeps/nptl/libc_start_call_main.h:58
#12 0x00007ffff7c28209 in __libc_start_main_impl (main=0x5cb940, argc=3, argv=0x7fffffffda18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffda08)
    at ../csu/libc-start.c:360
#13 0x00000000005cb875 in _start ()

Line 283 is the Py_DECREF here:

    if (result != LIBDEFLATE_SUCCESS) {
        Py_DECREF(output);
        PyErr_SetString(DeflateError, "Decompression failed.");
        return NULL;
    }

Note that Py_DECREF has this warning: The object must not be NULL; if you aren’t sure that it isn’t NULL, use [Py_XDECREF()](https://docs.python.org/3/c-api/refcounting.html#c.Py_XDECREF). although I'm not sure whether this is what happens here.

It is already surprising that this is inside the non-success path. Looking further at the source code, it seems that two arguments are expected, a bytes object and the decompressed size! (Unhelpfully it simply is called "size" without any explanation, so you need to read the source.)

This works:

import zlib, deflate
uncompressed = b"a" * 100
compressed = zlib.compress(uncompressed)
decompressed = deflate.zlib_decompress(compressed, len(uncompressed))
print(decompressed)
assert decompressed == uncompressed
dcwatson commented 2 months ago

Which version of Python are you using, and are you using the bundled libdeflate, or a system-installed version?

dcwatson@alektra deflate % python3.11 -c 'import zlib, deflate; deflate.zlib_decompress(zlib.compress(b"a"))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
deflate.DeflateError: Decompression failed.

I agree that the documentation and API could use improvement. As for the Py_DECREF warning, there is a NULL check above, but perhaps _PyBytes_Resize is setting it to NULL after that? I'd be happy to accept a PR with a failing test so I can reproduce/fix this. I'm also due to bump the bundled libdeflate version, so I'll probably do that soon as well.

mxmlnkn commented 2 months ago

Python 3.11.6

are you using the bundled libdeflate, or a system-installed version?

No idea. Is there a way to query that? I simply installed it via pip install deflate, so I'd assume it is the bundled one? But, I also have libdeflate 1.18 installed.

dcwatson commented 2 months ago

I just pushed out 0.6.0 if you want to give it a try.

mxmlnkn commented 2 months ago

Ok, very interesting. I was trying to reproduce it on my other system and couldn't. I also only got the same non-segfault error you showed.

System with segfault:

System that only shows the error message:

I don't know what else could be different and could have an influence. I also tried listing all loaded shared libraries after importing deflate but it doesn't look as if it uses the system libdeflate.so.

import os, deflate
mappedFilesFolder = f"/proc/{os.getpid()}/map_files"
if os.path.isdir(mappedFilesFolder):
    libraries = set(
        os.readlink(os.path.join(mappedFilesFolder, link)) for link in os.listdir(mappedFilesFolder)
    )
    print(sorted(list(libraries)))

Output:

/home/user/.local/lib/python3.11/site-packages/deflate.cpython-311-x86_64-linux-gnu.so
/usr/bin/python3.11
/usr/lib/locale/locale-archive
/usr/lib/python3.11/lib-dynload/readline.cpython-311-x86_64-linux-gnu.so
/usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/usr/lib/x86_64-linux-gnu/libc.so.6
/usr/lib/x86_64-linux-gnu/libexpat.so.1.8.10
/usr/lib/x86_64-linux-gnu/libm.so.6
/usr/lib/x86_64-linux-gnu/libpthread.so.0
/usr/lib/x86_64-linux-gnu/libreadline.so.8.2
/usr/lib/x86_64-linux-gnu/libtinfo.so.6.4
/usr/lib/x86_64-linux-gnu/libz.so.1.2.13

Well, on the system, on which I can still reproduce the segfault, updating to 0.6.0 fixes it. I now get:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: function missing required argument 'originalsize' (pos 2)
dcwatson commented 2 months ago

Great! This was most likely caused by PyBytes_FromStringAndSize returning an immortal object for size 0, then using that as the decompression buffer and calling _PyBytes_Resize on it.