MatthieuDartiailh / bytecode

Python module to modify bytecode
https://bytecode.readthedocs.io/
MIT License
302 stars 38 forks source link

Question: how stable is it to round-trip code? #66

Closed fabioz closed 4 years ago

fabioz commented 4 years ago

I'm experimenting on using bytecode to add programmatic breakpoints in pydevd (https://github.com/fabioz/PyDev.Debugger/).

The use case is getting the existing bytecode, adding some code to activate the pydevd breakpoint and then save it back (I'm trying to migrate from the existing code which does that but fails on some corner cases).

i.e.: something as:

b = bytecode.Bytecode.from_code(code_to_modify)
# modify to add new instructions at breakpoints ... something as:
b.insert(i, Instr("LOAD_GLOBAL", '_pydev_stop_at_break'))
b.insert(i + 1, Instr("LOAD_CONST", stop_at_line))
b.insert(i + 2, Instr("CALL_FUNCTION", 1))
b.insert(i + 3, Instr("POP_TOP"))
new_code = b.to_code()

On my experiments it seems to be working well, but I was wondering if you know of any corner case where doing so would not be safe or if something else would need to be taken into account for such a round-trip to work.

p.s.: Sorry for using the tracker to ask a question, I wasn't sure what was the appropriate channel here.

MatthieuDartiailh commented 4 years ago

This is a perfectly legitimate use case of bytecode and any issue your encounter doing this would be a bug.

@thautwarm if I remember correctly you use bytecode to modify constants and probably do a lot of round tripping as a consequence, can you comment on this too.

thautwarm commented 4 years ago

@MatthieuDartiailh Yes, I did a lot of this sort of stuffs..

I don't see any corner case that would break, if any it must be a bug and should be solved.

@fabioz For your use case, there might be something to be take care.

  1. If you're using Python < 3.6, be cautious about lineno of an Instr.

    Python 3.6 begins to support negative line number delta, so in 3.5 or lower versions, when inserting new instructions, you should guarantee that the linenos are monotonically increasing along instructions.

  2. Besides, I recommend you to generate new instructions via a generator, instead of performing multiple insertions(it is inefficient if you insert for multiple times).

    def gen(instrs):
      yield from (instrs[i] for i in range(1, i)
      yield instrs[i]
      yield Instr("LOAD_GLOBAL", '_pydev_stop_at_break')
      yield Instr("LOAD_CONST", stop_at_line)
      yield Instr("CALL_FUNCTION", 1)
      yield Instr("POP_TOP")
      yield from (instrs[i] for i in range(i, len(instrs))
  3. A global name _pydev_stop_at_break might be rare enough and users will not write down such a name, but still, it is not well encapsulated or hidden well from users.

    If the bytecode modified by debugger will not be dumped into .pyc file, you might consider insert a constant function instead of loading it from globals(loading from globals also requires you to initialize the variable in each module).

     Instr("LOAD_GLOBAL", '_pydev_stop_at_break')

    ->

      def pydev_stop_at_break(...):
         ...
      Instr("LOAD_CONST", pydev_stop_at_break)

    I used to have this demand so I asked for this feature, thanks to @MatthieuDartiailh for kindly reviewing and accepting this.

fabioz commented 4 years ago

Thank you for the feedback.

@thautwarm I'm actually using that only with Python 3.5 onwards and I'm converting the list to a double linked list (so that multiple insertions are efficient and to have a nicer API as iterating/changing items at the same time is a bit more straightforward), so, 1 and 2 are already ok, and thanks for the tip on the LOAD_CONST, I'll take a look into using that.