apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.34k stars 2.19k forks source link

Python: Fix repr of the Literals #5924

Closed Fokko closed 1 year ago

Fokko commented 2 years ago

Feature Request / Improvement

We instantiate the literals using the literal constructor, but the repr is the specific literal. I think we just want to use the literal constructor in the repr as well:

def test_double_literal():
>       assert repr(literal(12.34)) == 'literal(12.34)'
E       AssertionError: assert 'DoubleLiteral(12.34)' == 'literal(12.34)'
E         - literal(12.34)
E         + DoubleLiteral(12.34)
E         ? ++++ ++

Would be great if we can override the __repr__ of each of the literals, and add tests to make sure that they stay correct.

Query engine

No response

hendrikmakait commented 2 years ago

I'm concerned about the introduction of ambiguity to the __repr__ with this change. Namely, the __repr__ of both a DoubleLiteral(1.0) and a FloatLiteral(1.0) would become literal(1.0), which in turn evaluates to a DoubleLiteral(1.0). Hence, the __repr__ becomes ambiguous with respect to the class of the original object and it will not recreate the same object for the case of the FloatLiteral. A similar situation occurs for BinaryLiteral and FixedLiteral.

rdblue commented 2 years ago

@hendrikmakait makes a good point, although I think double and float are the only cases where it may matter. And even then, binding should handle any problems caused by this. Isn't binding the only way to produce FloatLiteral anyway?

Fokko commented 2 years ago

That's a very good point indeed. I think we have multiple cases where this can happen.

hendrikmakait commented 2 years ago

@rdblue: My main point is that this change affects debugging, which __repr__ is intended for (https://docs.python.org/3/reference/datamodel.html#object.__repr__). This will make it more involved to figure out what exact object one is dealing with and to see whether code, e.g., conversions, worked as intended from just glancing at debugger output. For example, the conversion LongLiteral(1234) -> FloatLiteral(1234.0) will become a literal(1234) -> literal(1234.0). While we are technically able to see that some conversion is happening, we have to spot the change in .0 and we still don't know whether it's a FloatLiteral or DoubleLiteral.

Fokko commented 1 year ago

Closing this one since there is no real consensus.