Eventual-Inc / Daft

Distributed DataFrame for Python designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.13k stars 143 forks source link

comparison on `Decimal` dtypes does not work #2906

Open universalmind303 opened 4 days ago

universalmind303 commented 4 days ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce

df = (daft.from_pydict({
        'floats': [328.00, 327.00]
    })
    .where(col('floats').cast(daft.DataType.decimal128(15, 2)) > 300)
    .collect()
)

---------------------------------------------------------------------------
DaftCoreException                         Traceback (most recent call last)
Cell In[36], line 5
      1 # %%
      2 df = (daft.from_pydict({
      3         'floats': [328.00, 327.00]
      4     })
----> 5     .where(col('floats').cast(daft.DataType.decimal128(15, 2)) > 300)
      6     .collect()
      7 )
      8 df
File ~/Development/Daft/daft/api_annotations.py:26, in DataframePublicAPI.<locals>._wrap(*args, **kwargs)
     24 type_check_function(func, *args, **kwargs)
     25 timed_method = time_df_method(func)
---> 26 return timed_method(*args, **kwargs)
File ~/Development/Daft/daft/analytics.py:198, in time_df_method.<locals>.tracked_method(*args, **kwargs)
    195 @functools.wraps(method)
    196 def tracked_method(*args, **kwargs):
    197     if _ANALYTICS_CLIENT is None:
--> 198         return method(*args, **kwargs)
    200     start = time.time()
    201     try:
File ~/Development/Daft/daft/dataframe/dataframe.py:1379, in DataFrame.where(self, predicate)
   1376     from daft.sql.sql import sql_expr
   1378     predicate = sql_expr(predicate)
-> 1379 builder = self._builder.filter(predicate)
   1380 return DataFrame(builder)
File ~/Development/Daft/daft/logical/builder.py:163, in LogicalPlanBuilder.filter(self, predicate)
    162 def filter(self, predicate: Expression) -> LogicalPlanBuilder:
--> 163     builder = self._builder.filter(predicate._expr)
    164     return LogicalPlanBuilder(builder)
DaftCoreException: DaftError::External Unable to create logical plan node.
Due to: DaftError::TypeError Cannot perform comparison on types: 15.2, Int32
Details:
DaftError::TypeError Cannot perform comparison on types: 15.2, Int32

Expected behavior the where clause should succeed with an implicit cast

should be equivalent to

dtype = daft.DataType.decimal128(15, 2)

df = (daft.from_pydict({
        'floats': [328.00, 327.00]
    })
    .where(col('floats').cast(dtype) > daft.lit(300).cast(dtype))
    .collect()
)
df
samster25 commented 4 days ago

Good catch, tagging it as p1 to mark as a priority

universalmind303 commented 4 days ago

Unassigning myself as I thought it'd be a quick fix and just add Decimal128 to get_supertype, but it looks like there's some issues when casting values to the supertype of Int128.

universalmind303 commented 3 days ago

For more context, I tried updating get_supertype to cast (Decimal128(_,_), <numeric type>) => Int128 but for some reason the physical arrow type for Int128 is ArrowType::Decimal(32,32), and casting 300 to Decimal(32,32) results in None.

So there's definitely some funky stuff happening in our cast logic here. pinging @colin-ho since you recently did some work on cleaning up the casting logic.