MatthieuDartiailh / bytecode

Python module to modify bytecode
https://bytecode.readthedocs.io/
MIT License
302 stars 38 forks source link

Possible "hysteresis" in bytecode recompilation with 3.12 #125

Open P403n1x87 opened 1 year ago

P403n1x87 commented 1 year ago

We've started investigating support for CPython 3.12 in our project that makes use of bytecode and we have observed a potential "hysteresis" in the following test

https://github.com/DataDog/dd-trace-py/blob/db7372d249de118a48b78d64327b9a903a388068/tests/debugging/function/test_store.py#L183-L206

The test is manipulating a bytecode object by adding extra instructions, and then removing them, in different orders. We want to check that we get an equal, albeit not identical, code object. Up until CPython 3.11 the last equality assertion would pass, but with 3.12 it fails. Using the dis module we can confirm that the bytecode content of the two code objects being tested is essentially the same, so the equality check must be failing for some other attribute(s) of the code object

Disassembly of original code object:
  5           0 RESUME                   0

  6           2 LOAD_FAST                0 (snafu)
              4 RETURN_VALUE
Disassembly of new code object:
  5           0 RESUME                   0

  6           2 LOAD_FAST                0 (snafu)
              4 RETURN_VALUE

For completeness, the function is defined as

def modulestuff(snafu):
    return snafu
P403n1x87 commented 1 year ago

With

            for k in dir(stuff.modulestuff.__code__):
                if k.startswith("co_"):
                    print(k, getattr(stuff.modulestuff.__code__, k) == getattr(code, k))

we get

co_argcount True
co_cellvars True
co_code True
co_consts True
co_exceptiontable True
co_filename True
co_firstlineno True
co_flags True
co_freevars True
co_kwonlyargcount True
co_lines False
co_linetable True
co_lnotab True
co_name True
co_names True
co_nlocals True
co_positions False
co_posonlyargcount True
co_qualname True
co_stacksize True
co_varnames True

which might narrow it down to just co_lines and co_positions. These differ because are bound methods. Their return values seem to coincide too.

MatthieuDartiailh commented 1 year ago

Can you confirm if this is a real issue or not ?

P403n1x87 commented 1 year ago

It depends on what we mean by "issue". We have round-trip tests that take a CodeType object from a function, add some extra bytecode in different order and places, then take it off, and rebuild a CodeType object. Up until CPython 3.11 we get equal CodeType objects (but not identical, because they are indeed different objects). With CPython 3.12, the equality checks fail, which is a bit of a concern. There is no evidence, however, that the new CodeType object is "functionally" different from the original.

MatthieuDartiailh commented 1 year ago

Ok. It is just that from the list you made in your previous comment the code object did look equal and that got me confused. I guess one would have to check the implementation of __eq__ on CodeType to know why it changed between 3.11 and 3.12.

MatthieuDartiailh commented 11 months ago

@P403n1x87 is this still an actual issue ?

MatthieuDartiailh commented 11 months ago

ping @P403n1x87

P403n1x87 commented 11 months ago

Sorry for the late reply. Because I have disabled the "round-trip" tests I don't know if the issue has been solved. I don't think I'll be able to look back at this any time soon unfortunately 🙁 .