colnotab / cpython

The Python programming language
https://www.python.org/
Other
0 stars 0 forks source link

Column numbers can go out-of-sync when opcodes have extended arguments #8

Open ammaraskar opened 3 years ago

ammaraskar commented 3 years ago

Currently we are assuming in compile.c that each instruction object turns into one bytecode:

https://github.com/colnotab/cpython/blob/44138dcca2b3d391c154c2fd5ec89cd1e2d0e9fc/Python/compile.c#L7012-L7014

However, this is not necessarily true and causes the column table to become out-of-sync with the linetable and actual instructions when opcodes that have extended argument get used. One such example that showed up test_traceback.py is JUMP_IF_NOT_EXC_MATCH which can be prepended with EXTENDED_ARG if the target/oparg is bigger than 255.

For example in this program:

# A bunch of instructions here to push the jump targets beyond a byte-range
# so they get turned into an EXTENDED_ARG.
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
print(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

try:
    x = 1
except TypeError:
    print('TypeError')

try:
    x = 1
except TypeError:
    print('TypeError')

def f():
    pass

1 / 0
f()

The error printed is:

Traceback (most recent call last):
  File "C:\Users\ammar\junk\test.py", line 33, in <module>
    1 / 0
    ^
ZeroDivisionError: division by zero

This seems to correspond to the LOAD_CONST of 1 instead of the full division operation. If we look at the disassembly:

 33         564 LOAD_CONST               0 (1)
            566 LOAD_CONST              15 (0)
            568 BINARY_TRUE_DIVIDE
            570 POP_TOP

After adding a print out to the assemble_cnotab method we find:

Instruction idx 560 -- line 33, i_col_offset=0, i_end_col_offset=1, opcode=LOAD_CONST
Instruction idx 562 -- line 33, i_col_offset=4, i_end_col_offset=5, opcode=LOAD_CONST
Instruction idx 564 -- line 33, i_col_offset=0, i_end_col_offset=5, opcode=BINARY_TRUE_DIVIDE
Instruction idx 566 -- line 33, i_col_offset=0, i_end_col_offset=5, opcode=POP_TOP

Notice that the actual instruction idx 568 corresponding to the true divide has become desynced with the column table thinking it instruction 564. We need to take instrsize into account when emitting the column table.

pablogsal commented 3 years ago

Excellent catch @ammaraskar! Thanks a lot for the detailed description. We should make sure we cover this in the test suite!

CC: @isidentical