Closed thautwarm closed 6 years ago
The issue here is not so much unhashable constants but more non-constant constants. I don't really like this. Could you detail your use case for what looks like an abuse of the constants to me ?
@MatthieuDartiailh Sure.
Actually we're making a transpiler from an extension of SQL to python, which requires loading tables when applying nested table join (like SELECT xxx from a inner join b on cond(a,b) left outer join c on cond(ab, c)
). We want to avoid the overhead of function call so we use nested for-loop to implement it, like
for record_tb1 in tb1:
for record_tb2 in tb2:
...
For tb1
and tb2
is actually constant, we don't want to load it from global to slow down program heavily.
AFAICS, there're at least following three reasons why we should supply the way to extend python constants:
People may want to use a singleton class with readonly properties to imitate constants, however the overhead just bursts, while it's verbose and the name binding of the class could be rewrited, too.
Could you clarify one point ? How do you use the bytecode ? Reading your previous message I would guess you need to generate valid Python code at the end, no ?
Yes, I do generate valid Python code but I need following semantics:
x = <anything defined before function `f`>
def f():
v: const = x
# do some stuff with v, where v is loaded as a constant, and cannot be put at LHS.
f
is a permanent function which would be called frequently.
To achieve this I perform ast transformation to change the Name
node in f
's ast whose id is v
to a mangled one(like const.1
), and then compile the transformed ast and get a code object.
Next, I use bytecode to change the load instructions whose operand is the mangled one, juts like
LOAD_GLOBAL const.1 => LOAD_CONST (x)
It does work and speed up the programs a lot, unless the constant is a unhashable one. Since then I made some change based on your code to support general constants [1].
Thanks for taking time with my problem.
I have mixed feelings about the proposed change.
I agree that it's abusing the const-ness of things. If anything, in optimizers that I've built I have sometimes been unsure of what would be well-behaved as far as CPython is concerned. In other words, "Is this optimization actually legitimate, does it maintain 100% of the subtle details and contracts of CPython and not introduce any new ones?" I've got some rewrite-rules for folding constants and I actually call hash(c) on a value: if it isn't hashable then I fail my "is it a foldable constant?" test.
On the other hand, bytecode isn't used by normal people. Its use can be analogous to dropping down to assembler when you just cannot express something in C/C++/whatever. It would make no sense for assembler to fail with an error message "That wouldn't be valid C behavior", since that's the whole reason you're using assembler in the first place.
I think what I've just talked myself into the following proposal: Let the bytecode library support construction of valid bytecodes, even if that means the result is not valid Python.
Edit: If not clear, outside of code review issues, I'm for thautwarm's proposal, even if his/her immediate problem could be solved in other ways.
Sorry for delay.
@SnoopyLane I must say I am quite convinced by your argument. So I believe too now that this should be allowed. However, since we generate the keys in the dictionary ourselves, I believe it would be simpler (ie less maintenance) to simply generate valid keys and keep a simple dictionary. In particular, when comparing the key generation algorithm to the current CPython trunk it seems to me it changed a bit since 3.6 (but I may be wrong).
I must also admit that at first it was not clear that this would even generate valid CPython, but now that @thautwarm implementation proves it works, I am good with it.
@MatthieuDartiailh current implementation const_key
function seems to be redundant for the key contains its type, its own and a generated key. If you just want to make sure to distinguish one of the consts from others, just use
token = object()
def const_key(const):
try:
hash(obj)
except TypeError: # unhashable
return id(obj), token
return obj
This function just works if you didn't take advantage of consts
's keys in other scenes.
This wouldn't distinguish False
, 0
, 0.0
, -0.0
, 0j
, etc.
Thanks @serhiy-storchaka to confirm my intuition that we need to keep following CPython implementation and introduce only a bit more generality to support non-hashable object. I believe the way to go is to first update the algorithm used to match the one in Python 3.7 and then add the little bit of extra freedom to allow non-hashable objects.
@serhiy-storchaka yes, all the hashale objects that could equal to others would break it...
Hello, I've found that the code object could not take unhashable object as its co_consts, for you use a dictionary to store the constants. I think that we may not follow the implementation of
_PyCode_ConstantKey
for further usage. Actually my project needs making more kinds of constants than CPython already has. https://github.com/vstinner/bytecode/blob/a7cc7a52ca10e58a8aee48052edd069fac3c5a01/bytecode/concrete.py#L439Could we add a compiler flag to support unhashable constants? Just consider using associate list (
List[Tuple[K, V]]
) to store unhashable ones.