MatthieuDartiailh / bytecode

Python module to modify bytecode
https://bytecode.readthedocs.io/
MIT License
302 stars 38 forks source link

EXTENDED_ARG 0 retained from input raises ValueError #25

Closed SnoopyLane closed 6 years ago

SnoopyLane commented 6 years ago

EXTENDED_ARG 0 shouldn't be in code, but sometimes they are. Trying to convert a Bytecode back to code ends up raising ValueError("invalid opcode or oparg") because PyCompile_OpcodeStackEffect doesn't handle EXTENDED_ARG.

They are retained when disassembling code to Bytecode due to commit 63f1ec9. That commit should NOT be undone because the disassembly depends on physical instruction sizes, and removing the EXTENDED_ARG 0 then wreaks havoc on all the jump offsets.

A somewhat simple fix is to remove them later in ConcreteBytecode.to_bytecode after the physical sizes are no longer needed.

An alternative would be to modify the stack_effect calls to deal with this. But IMO its better to just remove them so that people building code analyzers or optimizers are forced to deal with it.

MatthieuDartiailh commented 6 years ago

Thanks for pointing this issue. I will try to think about it. I would prefer a solution that would allow to treat all EXTENDED_ARG on the same footing. Ping me if I do not answer in a week.

serhiy-storchaka commented 6 years ago

Not all EXTENDED_ARG 0 are useless. For example LOAD_CONST referring to the constant number 0x10000 has two EXTENDED_ARG prefixes, one of them with argument 0.

MatthieuDartiailh commented 6 years ago

Thanks for pointing this out @serhiy-storchaka. I will keep this in mind.

MatthieuDartiailh commented 6 years ago

@serhiy-storchaka I noticed that in some cases CPython can emit EXTENDED_ARG 0 for arguments that actually are not too large to fit in the following opcode argument. Could you point me at where this might be happening in CPython source ? To me it looks like a bug.

serhiy-storchaka commented 6 years ago

The only place I have found in peephole.c. And it is even mentioned in a comment. This is not a bug, but just the lack of optimization of this rare case.

MatthieuDartiailh commented 6 years ago

OK, I wish I had known this could happen. Thanks for looking this up @serhiy-storchaka.

SnoopyLane commented 6 years ago

@serhiy-storchaka yes but in this case it is a leading EXTENDED_ARG 0. The example you point out of EXTENDED_ARG 0 following an initial non-0 is already property handled.