databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.85k stars 750 forks source link

Feature: Add Support for INTERVAL Data Type - Already Supported in Parquet & Arrow #16677

Open inviscid opened 3 weeks ago

inviscid commented 3 weeks ago

Summary

Interval is a value type that Databend understands as it is used in date addition. However, there is no current way to store an Interval value like can be done in Postgres.

While Snowflake and MySQL also do not support the Interval type, Postgres does and it makes life so much easier since it is quite common to store duration information. Both Parquet and Arrow do support an Interval/Duration data type.

The Parquet standard does support an Interval data type as defined here: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval

Arrow also supports a Duration type with varying levels of resolution. It would likely be safe to pick a reasonable default resolution for Arrow usage: https://arrow.apache.org/docs/python/generated/pyarrow.duration.html#pyarrow.duration. This conversion function seems to suggest that might be milliseconds: https://arrow.apache.org/rust/parquet/arrow/arrow_writer/fn.get_interval_dt_array_slice.html

The ideal approach would be one where an Interval value could be marshalled and unmarshalled from Parquet using native Parquet and Arrow types.

sundy-li commented 3 weeks ago

New DataTypes could be supported after #16610 , we are still in a big refactoring stage.

BohuTANG commented 2 days ago

16610 has been merged, and this feature is now ready to be added to the work queue.

sundy-li commented 2 days ago

Better do it after #16814