Open westonpace opened 1 year ago
Hello I'm new to arrow-datafusion project and I would like to contribute 😄 Is there any chance to help with this issue?
Is it possible to alias this feature to arrow_trunc
since they share similarities 🤔?
Hello, would someone invite me to datafusion slack channel?
@fernandocast -- please let me know what email you would like (either email me at alamb@influxdata.com or join the discord channel https://arrow.apache.org/datafusion/contributor-guide/communication.html#slack-and-discord and ask there
This sounds like a nice feature to me, FWIW
It looks like sqlparser already supports the feature https://docs.rs/sqlparser/0.36.1/sqlparser/ast/enum.DataType.html#variant.Timestamp
So it would be a matter of hooking up DataFusion to it
Postgres supports an optional precision specifier in timestamp literals (e.g.
timestamp (3) '2021-01-01 00:00:00.123'
.
I don't think this should be necessary.
From literal precision inference perspective, TIMESTAMP literals are not different from DECIMAL or varchar. The literal's precision should be reflected in the literal's type.
TIMESTAMP '2021-01-01 00:00:00.123'
clearly has millisecond precision. requiring user to add (3)
part is redundant.
if a user wants the literal to be parsed with specific precision, they can use CAST('2021-01-01 00:00:00.123' AS timestamp(p))
or shorter '2021-01-01 00:00:00.123'::timestamp(p)
.
Is your feature request related to a problem or challenge?
Postgres supports an optional precision specifier in timestamp literals (e.g.
timestamp (3) '2021-01-01 00:00:00.123'
. The postgres spec technically only allows 0-6 but given that Arrow timestamps support nanoseconds it would probably be best to support 0-9.Describe the solution you'd like
For my purposes, It would be sufficient to only support precision values of 0, 3, 6, and 9 (seconds, milliseconds, microseconds, and nanoseconds) though it should be possible to support values that aren't a multiple of 3 since the expectation is that this value is only used for parsing the literal and it is not a constraint on the type at all (e.g. a timestamp(5) could be stored at microsecond resolution as long as the string is parsed correctly).
Ideally, output would look like the following:
Example postgres output: https://www.db-fiddle.com/f/oiHdDy1v78mC1zKbCFvWdV/0
Describe alternatives you've considered
A pretty usable workaround at the moment is to cast:
Unfortunately, this requires df-specific functions (arrow_cast) and it would also break backwards compatibility with Lance's current SQL parsing.
Additional context
No response