TeskaLabs / cysimdjson

Very fast Python JSON parsing library
Apache License 2.0
353 stars 16 forks source link

Accessing results outside of scope where parser was referenced leads to segfault? #50

Open ctheune opened 11 months ago

ctheune commented 11 months ago

I found this experimentally: is it assumed that access to the results are only valid as long as the parser object lives / is in scope?

I noticed that when I have a function that instantiates the parser and returns the result of the parse output I will regularly get segfaults. This isn't very Python and might need a bigger warning and/or some utilities to assist managing the lifecycle ...

ctheune commented 11 months ago

My workaround for this situation:

class CysimdResult(object):
    def __init__(self, parser, result):
        self.parser = parser
        self.result = result

async def rgwadmin(*args):
    proc = await asyncio.create_subprocess_exec(
        "radosgw-admin",
        *args,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )

    parser = cysimdjson.JSONParser()
    stdout, stderr = await proc.communicate()
    return CysimdResult(parser, parser.parse(stdout))

I could make the Result object a bit smarter by proxying the underlying attributes to the result ... but for understanding the issue, this should suffice.

ateska commented 5 months ago

This is correct. It is actually the SIMDJSON requirement. But I agree, this is not safe in a Python typical way.