cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.88k stars 3.77k forks source link

sql: the conversion rules for numeric types to interval are not the same as string to interval #57876

Open knz opened 3 years ago

knz commented 3 years ago

Found by @petermattis:

> SELECT 2592000::interval, '2592000'::interval;
  interval | interval
-----------+------------
  1 mon    | 720:00:00

What is happening here:

Or can we?


As the person who originally implemented this, I (@knz) 100% agree this was an oversight and should be corrected. The reason why this misdesign exists is that the casts appeared “easy” to implement: take the numeric value, convert it to a duration.Duration, then stick that in an interval.

The numeric to duration.Duration conversion functions were already there.

I had not appreciated that these conversion functions were expanding the seconds into days/months. Was that intentional? I don't know. (However, see below.)


How to fix this?

Before we go into details, we need to establish some clear goals:

Can we achieve this?

I think case (4) will benefit from the proposed adjustment, because arguably SHOW LAST QUERY STATISTICS wants an interval expressed in hours/minutes/seconds and doesn't care about days/months.

However, I will recommend making sure we have ample testing for cases (2) and (3), with values that exceed a month, and ensure we don't diverge from postgres for those. (Maybe they are already incompatible with pg, I don't know? If they are, then they will also benefit from the fix.)

Of course, we can't let duration.FromFloat64 do "something different" from duration.FromInt64 because then the rules are not homogeneous any more.

cc @otan for triage.

blathers-crl[bot] commented 3 years ago

Hi @knz, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

While you're here, please consider adding an A- label to help keep our repository tidy.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

arctica commented 2 years ago

This bug just bit me. I was wondering why my timestamp values were way way off and nailed it down to some decimal -> interval casts giving completely wrong results. Not every month has the same amount of seconds in it so a numerical conversion to an interval does not make sense unless one for example keeps the highest time unit to hours like postgres does. Even that could be slightly off due to the fact that leap seconds being a thing but for me personally that's not an issue.

To illustrate what mayham this can cause:

SELECT
    1640995123::interval,
    '1640995123'::interval,
    0::timestamp + 1640995123::interval,
    0::timestamp + '1640995123'::interval
{"years":52,"months":9,"days":2,"hours":23,"minutes":58,"seconds":43}
{"hours":455831,"minutes":58,"seconds":43}
2022-10-03 23:58:43
2021-12-31 23:58:43

It would be great if someone could revisit the interval type and maybe expand test coverage. Dates and times are a surprisingly tricky subject and yet so important to most programs.

otan commented 2 years ago

thanks for letting us know arctica!

my 2c is that we should deprecate int+float/interval casts, since i actually think they don't quite map to each other as concepts. intervals in SQL standard are some Month+Day+Hour construct with confusing rules about mapping days -> months and the like (e.g. years can be treated as 365.25 days or 365 days depending on context - we have some complex logic and fallout from that a while ago, see #56667 as an example). PG indeed doesn't support this, i'm guessing for that reason and more.

cc @vy-ton do you think we should support this? if so, how do we think ints/floats map to intervals - unix seconds, or like the age builtin which has complex rules?