apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.63k stars 804 forks source link

Make Interval Parsing More Permissive #6390

Open samuelcolvin opened 2 months ago

samuelcolvin commented 2 months ago

Describe the bug

While working on https://github.com/apache/datafusion/pull/12448 I noticed that the string 5 day hour is not interpreted correctly by parse_interval_month_day_nano_config, it should be considered the same as 5 day 0 hour, instead it return an error.

To Reproduce

run that function, or look at https://github.com/apache/datafusion/pull/12448

Expected behavior

parse_interval_month_day_nano_config should match postgres behaviour in general

Additional context

Related to #6211 where I improved interval parsing a lot, although apparently not enough.

alamb commented 2 months ago

I think 5 day hour is a pretty uncommon usecase . Thank you for filing this

samuelcolvin commented 2 months ago

The most likely scenario where it could come up is select interval '5 day' hour, but I agree even that seems like either allowing a common mistake, or a side effect of their very permissive parser.


While we're talking about weird things postgres supports, as per https://github.com/sqlparser-rs/sqlparser-rs/issues/1345#issuecomment-2334514705, the following are also supported:

select interval '1 month + 2 day'
select interval '1 minute + 2 s'  -- makes sense, all units are supported in arithmetic expressions
select interval '1 minute - 2 s'
select interval '1 minute + 2'  -- makes sense since raw numbers are interpreted as seconds
select interval '1 minute * 2s'  -- this is crazy! the "*" operator is interpretted as "+" so this evaluates as 0:01:02
select interval '1 minute * 2' -- same! 
select interval '1 minute minute' -- repeated units are ignored
select interval '1 month + 2 day' day -- same thing with the units just concatenated
ByteBaker commented 2 months ago

@alamb do we have a plan on integrating this inside Arrow?

alamb commented 2 months ago

@alamb do we have a plan on integrating this inside Arrow?

I am not sure what you mean by this question @ByteBaker

ByteBaker commented 2 months ago

I meant to ask if we plan to handle more cases in our parsing logic? To be closer to parsing abilities of Postgres? As @samuelcolvin pointed out above.

alamb commented 2 months ago

I meant to ask if we plan to handle more cases in our parsing logic? To be closer to parsing abilities of Postgres? As @samuelcolvin pointed out above.

I guess in general I think it would be driven by contributor needs -- if someone wants needs more exotic parsing support, we can definitely consider it, but if it gets too specific, it probably makes sense to keep the parsing in the application and not in arrow-rs

ByteBaker commented 2 months ago

In that case we need to decide if we wanna keep this one open or not.

alamb commented 2 months ago

Let's leave it open so it is easier to find if someone else hits the problem