apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.87k stars 1.11k forks source link

Support to convert durations with arbitrary unit into durations with a pre-defined unit #10082

Open chmp opened 4 months ago

chmp commented 4 months ago

Is your feature request related to a problem or challenge?

I would like to get the number of seconds elapsed of a duration, that is the result of the subtraction of two Timestamp columns. The resulting unit is controlled by datafusion and hence unknown to me.

Describe the solution you'd like

I would would like to cast between durations of different types. This feature does not seem to be implemented:

SELECT arrow_cast(arrow_cast(1, 'Duration(Microsecond)'), 'Duration(Second)') as value;
-- This feature is not implemented: Unsupported CAST from Duration(Microsecond) to Duration(Second)

I would expect the same behavior as casting between timestamps:

SELECT arrow_cast(arrow_cast(arrow_cast(24 * 60 * 60, 'Timestamp(Second, None)'), 'Timestamp(Millisecond, None)'), 'Int64') as value;
-- +----------+
-- | value    |
-- +----------+
-- | 86400000 |
-- +----------+
-- 1 row in set. Query took 0.003 seconds.

Describe alternatives you've considered

My initial attempt was to divide the resulting duration by a known reference duration. However this resulted in a planning error:

SELECT arrow_cast(1, 'Duration(Microsecond)') /  arrow_cast(1, 'Duration(Microsecond)') as value;
-- Error during planning: Cannot coerce arithmetic expression Duration(Microsecond) / Duration(Microsecond) to valid types

An alternative would be to simply cast the duration to an Int64. However, this operation discards the time unit of the duration. If the duration unit is unknown, this operation is potentially dangerous.

Additional context

All examples are using DataFusion CLI v37.0.0. If you point me in the right direction, I would be happy to implement a PR.

chmp commented 4 months ago

After looking a bit through the code, I guess, adding the (Duration(unit1), Duration(unit2)) arms to this match statement in arrow-cast should do the trick, shouldn't it?