MatthieuDartiailh / bytecode

Python module to modify bytecode
https://bytecode.readthedocs.io/
MIT License
302 stars 38 forks source link

Support general constants #32

Closed thautwarm closed 6 years ago

thautwarm commented 6 years ago

Hello, I've found that the code object could not take unhashable object as its co_consts, for you use a dictionary to store the constants. I think that we may not follow the implementation of _PyCode_ConstantKey for further usage. Actually my project needs making more kinds of constants than CPython already has. https://github.com/vstinner/bytecode/blob/a7cc7a52ca10e58a8aee48052edd069fac3c5a01/bytecode/concrete.py#L439

    self.consts = {}
    ...
    def add_const(self, value):
        key = const_key(value)
        if key in self.consts:
            return self.consts[key]
        index = len(self.consts)
        self.consts[key] = index
        return index

Could we add a compiler flag to support unhashable constants? Just consider using associate list (List[Tuple[K, V]]) to store unhashable ones.

MatthieuDartiailh commented 6 years ago

The issue here is not so much unhashable constants but more non-constant constants. I don't really like this. Could you detail your use case for what looks like an abuse of the constants to me ?

thautwarm commented 6 years ago

@MatthieuDartiailh Sure.

Actually we're making a transpiler from an extension of SQL to python, which requires loading tables when applying nested table join (like SELECT xxx from a inner join b on cond(a,b) left outer join c on cond(ab, c)). We want to avoid the overhead of function call so we use nested for-loop to implement it, like

for record_tb1 in tb1:
   for record_tb2 in tb2:
      ...

For tb1 and tb2 is actually constant, we don't want to load it from global to slow down program heavily.

AFAICS, there're at least following three reasons why we should supply the way to extend python constants:

People may want to use a singleton class with readonly properties to imitate constants, however the overhead just bursts, while it's verbose and the name binding of the class could be rewrited, too.

MatthieuDartiailh commented 6 years ago

Could you clarify one point ? How do you use the bytecode ? Reading your previous message I would guess you need to generate valid Python code at the end, no ?

thautwarm commented 6 years ago

Yes, I do generate valid Python code but I need following semantics:

x = <anything defined before function `f`>
def f():
   v: const = x
   # do some stuff with v, where v is loaded as a constant, and cannot be put at LHS.

f is a permanent function which would be called frequently. To achieve this I perform ast transformation to change the Name node in f's ast whose id is v to a mangled one(like const.1), and then compile the transformed ast and get a code object.

Next, I use bytecode to change the load instructions whose operand is the mangled one, juts like

LOAD_GLOBAL const.1 => LOAD_CONST (x)

It does work and speed up the programs a lot, unless the constant is a unhashable one. Since then I made some change based on your code to support general constants [1].

Thanks for taking time with my problem.

SnoopyLane commented 6 years ago

I have mixed feelings about the proposed change.

I agree that it's abusing the const-ness of things. If anything, in optimizers that I've built I have sometimes been unsure of what would be well-behaved as far as CPython is concerned. In other words, "Is this optimization actually legitimate, does it maintain 100% of the subtle details and contracts of CPython and not introduce any new ones?" I've got some rewrite-rules for folding constants and I actually call hash(c) on a value: if it isn't hashable then I fail my "is it a foldable constant?" test.

On the other hand, bytecode isn't used by normal people. Its use can be analogous to dropping down to assembler when you just cannot express something in C/C++/whatever. It would make no sense for assembler to fail with an error message "That wouldn't be valid C behavior", since that's the whole reason you're using assembler in the first place.

I think what I've just talked myself into the following proposal: Let the bytecode library support construction of valid bytecodes, even if that means the result is not valid Python.

Edit: If not clear, outside of code review issues, I'm for thautwarm's proposal, even if his/her immediate problem could be solved in other ways.

MatthieuDartiailh commented 6 years ago

Sorry for delay.

@SnoopyLane I must say I am quite convinced by your argument. So I believe too now that this should be allowed. However, since we generate the keys in the dictionary ourselves, I believe it would be simpler (ie less maintenance) to simply generate valid keys and keep a simple dictionary. In particular, when comparing the key generation algorithm to the current CPython trunk it seems to me it changed a bit since 3.6 (but I may be wrong).

I must also admit that at first it was not clear that this would even generate valid CPython, but now that @thautwarm implementation proves it works, I am good with it.

thautwarm commented 6 years ago

@MatthieuDartiailh current implementation const_key function seems to be redundant for the key contains its type, its own and a generated key. If you just want to make sure to distinguish one of the consts from others, just use

token = object()
def const_key(const):
   try:
       hash(obj)
   except TypeError:  # unhashable
      return id(obj), token
   return obj

This function just works if you didn't take advantage of consts's keys in other scenes.

serhiy-storchaka commented 6 years ago

This wouldn't distinguish False, 0, 0.0, -0.0, 0j, etc.

MatthieuDartiailh commented 6 years ago

Thanks @serhiy-storchaka to confirm my intuition that we need to keep following CPython implementation and introduce only a bit more generality to support non-hashable object. I believe the way to go is to first update the algorithm used to match the one in Python 3.7 and then add the little bit of extra freedom to allow non-hashable objects.

thautwarm commented 6 years ago

@serhiy-storchaka yes, all the hashale objects that could equal to others would break it...