Closed IPSW1 closed 4 years ago
I will try to have a look either this week-end or next Wednesday night. Feel free to ping me if you do not hear from me.
Alright, thank you. I tried to do a fix in this commit, which is meant to be as general as possible. It resolves the extended arguments and temporarily changes the instruction argument to the resolved value. It seems that the dis
module is doing it in this way, which is why the stack effect is computed correctly for the above example.
I did not test any possible side effects. But I also don't think this will make the problem worse, since it does nothing if there are not extended arguments. However, there still remain problems when including EXTENDED_ARG
instructions, for example when there are more than 256 names in a code object or if there are long jumps.
Can you open a PR with this commit ? Also if you can come up with a bunch of failing unit test it would be helpful to help improve support for EXTENDED_ARG.
I did here #64. Was just worried about my code quality, that's why I hesitated :)
There are also three added test cases, but only one of them is expected to fail. It seems like this has something to do with a split of huge block and a wrong start size for one block, but I am not entirely sure.
EDIT: Looks like this test case only fails for versions from 3.6.
I finally found some time to look at this in details. My conclusion is that the issue is that EXTENDED_ARG are propagated to higher level representation of the bytecode namely Bytecode
and ControlFlowGraph
which should not happen. EXTENDED_ARG is a low-level detail and should be ignored in those high level representation.
I will work to fix the ConcreteBytecode.to_bytecode
method to eliminate all EXTENDED_ARG if any is present which should address all issues related to them. For symmetry I will try to add an extended_args
to Bytecode.to_concrete_bytecode
to allow to generate a ConcreteBytecode
with EXTENDED_ARG.
If you have any example that uses EXTENDED_ARG and that fails when doing Bytecode.from_code(code).to_code()
please let me know because that would be a different kind of bug and not fixed by the above.
I just opened #65 that addresses your issues. All the tests you included in #64 now pass.
@IPSW1 I noticed you closed #64. Can you confirm that #65 addresses all your issues ?
TL;DR: Yes, it does.
The thing is I just realized that I wanted to have the benefits of both the concrete and the abstract world, i.e. the EXTENDED_ARG and also the convenience with abstraction. As you described in #65, this would lead to possibly a lot of problems regarding stack size computation (for which my bad hack in #64 is a good example).
So I think your proposed solution is completely reasonable for the issues.
@IPSW1 Do you need a new release now that #65 has been merged ?
There is no need for a specific release for me.
Thank you for the fix!
Hi, I stumbled upon a problem with the stack size computation in combination with the
EXTENDED_ARG
instruction. Take for example this code snipped:The disassembly looks like this:
Now generating the bytecode from this code and converting back to a code object fails due to wrong stack size computation, when the bytecode is generated with the parameter to include extended arguments here set to True (I specifically need this behavior for the bytecode to match the disassembly).
The problem seems to be the use of the
EXTENDED_ARG
instruction, which is not handled correctly. The stack size computation treatsUNPACK_EX
as a single instruction and takes into account only the lower byte (corresponding to the number of variables before the list value, i.e. variablesq
,r
), but not the higher byte provided byEXTENDED_ARG
(corresponding to the number of values after the list value, i.e. variablet
). This leads to a negative stack size.Is this a known problem and is there a solution or a workaround for this, except just excluding the
EXTENDED_ARG
instruction?(For reference:
UNPACK_EX
documentation, stack effect ofUNPACK_EX
)EDIT: Ok, an easy workaround would be to generate two bytecode objects, one without and one with the parameter for extended arguments. I can generate a code object from the one without
EXTENDED_ARG
and keep the other one for reference and comparison with the disassembly. Nevertheless, I don't think that is the nicest option here.