Discussion : Improving handling of co_flags

MatthieuDartiailh commented 7 years ago

Currently the flags for a code object need to be specified manually as a single integer. This approach offers the maximum flexibility but is also error prone as the flags and could I believe be improved. Follow are the 'official' flags (ie excluding future flags such as CO_FUTURE_BARRY_AS_BDFL and CO_FUTURE_GENERATOR_STOP).

From dis.COMPILER_FLAG_NAMES

 1 OPTIMIZED
  2 NEWLOCALS
  4 VARARGS
  8 VARKEYWORDS
 16 NESTED
 32 GENERATOR
 64 NOFREE
128 COROUTINE
256 ITERABLE_COROUTINE

Among those we can identify kind of three families (please correct me if I got this wrong):

the flags completely independent of the underlying code : NEWLOCALS, VARARGS, VARKEYWORDS, ITERABLE_COROUTINE
the flags completely dependent on the underlying code: OPTIMIZED, NO_FREE, GENERATOR
and in between flags:
- NESTED: apply only to function like code defined in another function (and honestly I have difficulty understanding what it does...)
- COROUTINE: can be obvious if GET_AWAITABLE is used but it is not always so

Furthermore COROUTINE and ITERABLE_COROUTINE are incompatible.

I hence believe that it would be profitable to be able to:

specify manually the value for the first kind through a higher level construct
have the proper value be computed in an automatic fashion for the second kind
be able to specify if the code is nested (byteplay does this by passing a keyword arg to to_code) and if the code is from a function (byteplay only guess here).
be able to force the coroutine behavior or have it inferred.

Looking at how byteplay handles this, there is a number of attribute on the code object itself allowing to specify the flags (and the from_function keyword arg in to_code). Moreover generator behavior can be forced.

I do not have a specific implementation in mind but I think that keeping the flag logic in a separate class differentiating between default values (from the original code or guessed) and forced user value may help. The conversion to an int would obsiously require the code.

What do people think ?

vstinner commented 7 years ago

I propose to experiment creating a new bytecode.Flags class.

Maybe bytecode could accept both Flags and int types for flags (convert int to Flags implicitly)?

serhiy-storchaka commented 7 years ago

Why not use enum.IntFlags rather of enum.IntEnum?

MatthieuDartiailh commented 7 years ago

Because IntFlags is new in 3.6

serhiy-storchaka commented 7 years ago

Maybe import external enum implementation in 3.5 and earlier?

MatthieuDartiailh commented 7 years ago

I would like @haypo point of view before adding an external dependency and one that would be version dependent.

vstinner commented 7 years ago

Sorry, I don't know well enum.IntFlag. Can you please show some examples compared to Matthieu's class?

serhiy-storchaka commented 7 years ago

It is similar to IntEnum, but supports combinations. It is just a bit mask with fanny representation.

>>> import enum
>>> class Flags(enum.IntFlag):
...     CO_OPTIMIZED             = 0x00001  # noqa
...     CO_NEWLOCALS             = 0x00002  # noqa
...     CO_VARARGS               = 0x00004  # noqa
... 
>>> Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS
<Flags.CO_NEWLOCALS|CO_OPTIMIZED: 3>
>>> print(Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS)
Flags.CO_NEWLOCALS|CO_OPTIMIZED
>>> int(Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS)
3
>>> Flags(3)
<Flags.CO_NEWLOCALS|CO_OPTIMIZED: 3>
>>> (Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS) & Flags.CO_NEWLOCALS
<Flags.CO_NEWLOCALS: 2>
>>> (Flags.CO_OPTIMIZED|Flags.CO_NEWLOCALS) & ~Flags.CO_NEWLOCALS
<Flags.CO_OPTIMIZED: 1>

vstinner commented 7 years ago

If we use enum.IntFlag, we need to write Matthieu logic to compute "implicit flags" someone else, near ConcreteBytecode.to_code().

How do you represent "no flag" (flags=0) using IntFlag?

serhiy-storchaka commented 7 years ago

You can just use Flags(0). Or add explicit name for value 0.

>>> class Flags(enum.IntFlag):
...     NO_FLAGS = 0
...     CO_OPTIMIZED = 1
...     CO_NEWLOCALS = 2
... 
>>> Flags(0)
<Flags.NO_FLAGS: 0>
>>> print(Flags(0))
Flags.NO_FLAGS
>>> Flags.NO_FLAGS|Flags.CO_OPTIMIZED
<Flags.CO_OPTIMIZED: 1>

MatthieuDartiailh commented 7 years ago

The issue I see is that there will be no way to let the user override the flag inference in such a case (what the _forced dict is currently used for).

MatthieuDartiailh commented 7 years ago

I addressed some of the comments in #20 but we need to decide on a representation of the flags before I complete. If we go with IntFlags we need a third party library to provide it for python < 3.6 (or rewrite it), one possible way to handle the inference would be a helper function returning a new flag from a flag and a bytecode (can generalize to non concrete I think) and it is then up to the user to update the flags after modifying the bytecode. Otherwise we keep the two classes we have and I just need to address the last comments. After thinking a bit about it, I think using IntFlags would probably lead to the less confusing situation (the None/True/False vales of inferable flags is not really obvious). Once we settle on a provider of IntFlags I can update my patch.

vstinner commented 7 years ago

Sorry, I don't have a strong opinion on the flags API. I let you decide with @serhiy-storchaka :-)

MatthieuDartiailh commented 7 years ago

@serhiy-storchaka are you aware of any backport of python 3.6 enum ? Otherwise I fear I will have to copy-paste it from CPython, which is not ideal.

MatthieuDartiailh commented 7 years ago

Closed by #23

MatthieuDartiailh / bytecode

Discussion : Improving handling of co_flags #19