Closed SnoopyLane closed 6 years ago
Thanks for pointing this issue. I will try to think about it. I would prefer a solution that would allow to treat all EXTENDED_ARG on the same footing. Ping me if I do not answer in a week.
Not all EXTENDED_ARG 0
are useless. For example LOAD_CONST
referring to the constant number 0x10000 has two EXTENDED_ARG
prefixes, one of them with argument 0.
Thanks for pointing this out @serhiy-storchaka. I will keep this in mind.
@serhiy-storchaka I noticed that in some cases CPython can emit EXTENDED_ARG 0 for arguments that actually are not too large to fit in the following opcode argument. Could you point me at where this might be happening in CPython source ? To me it looks like a bug.
The only place I have found in peephole.c. And it is even mentioned in a comment. This is not a bug, but just the lack of optimization of this rare case.
OK, I wish I had known this could happen. Thanks for looking this up @serhiy-storchaka.
@serhiy-storchaka yes but in this case it is a leading EXTENDED_ARG 0. The example you point out of EXTENDED_ARG 0 following an initial non-0 is already property handled.
EXTENDED_ARG 0 shouldn't be in code, but sometimes they are. Trying to convert a Bytecode back to code ends up raising ValueError("invalid opcode or oparg") because PyCompile_OpcodeStackEffect doesn't handle EXTENDED_ARG.
They are retained when disassembling code to Bytecode due to commit 63f1ec9. That commit should NOT be undone because the disassembly depends on physical instruction sizes, and removing the EXTENDED_ARG 0 then wreaks havoc on all the jump offsets.
A somewhat simple fix is to remove them later in ConcreteBytecode.to_bytecode after the physical sizes are no longer needed.
An alternative would be to modify the stack_effect calls to deal with this. But IMO its better to just remove them so that people building code analyzers or optimizers are forced to deal with it.