Open knz opened 3 years ago
Hi @knz, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.
While you're here, please consider adding an A- label to help keep our repository tidy.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
This bug just bit me. I was wondering why my timestamp
values were way way off and nailed it down to some decimal -> interval casts giving completely wrong results. Not every month has the same amount of seconds in it so a numerical conversion to an interval does not make sense unless one for example keeps the highest time unit to hours like postgres does. Even that could be slightly off due to the fact that leap seconds being a thing but for me personally that's not an issue.
To illustrate what mayham this can cause:
SELECT
1640995123::interval,
'1640995123'::interval,
0::timestamp + 1640995123::interval,
0::timestamp + '1640995123'::interval
{"years":52,"months":9,"days":2,"hours":23,"minutes":58,"seconds":43}
{"hours":455831,"minutes":58,"seconds":43}
2022-10-03 23:58:43
2021-12-31 23:58:43
It would be great if someone could revisit the interval type and maybe expand test coverage. Dates and times are a surprisingly tricky subject and yet so important to most programs.
thanks for letting us know arctica!
my 2c is that we should deprecate int+float/interval casts, since i actually think they don't quite map to each other as concepts. intervals in SQL standard are some Month+Day+Hour construct with confusing rules about mapping days -> months and the like (e.g. years can be treated as 365.25 days or 365 days depending on context - we have some complex logic and fallout from that a while ago, see #56667 as an example). PG indeed doesn't support this, i'm guessing for that reason and more.
cc @vy-ton do you think we should support this? if so, how do we think ints/floats map to intervals - unix seconds, or like the age
builtin which has complex rules?
Found by @petermattis:
What is happening here:
'2592000'::interval
uses the postgres string to interval conversion rules, which state that a number of seconds converted to interval should not expand to days / months and keep the interval to the hour unit.2592000::interval
usesint -> interval
cast, which is a crdb-specific extension, not recognized by postgres. In Peter's words “so we could fix it”Or can we?
As the person who originally implemented this, I (@knz) 100% agree this was an oversight and should be corrected. The reason why this misdesign exists is that the casts appeared “easy” to implement: take the numeric value, convert it to a
duration.Duration
, then stick that in an interval.The numeric to
duration.Duration
conversion functions were already there.I had not appreciated that these conversion functions were expanding the seconds into days/months. Was that intentional? I don't know. (However, see below.)
How to fix this?
Before we go into details, we need to establish some clear goals:
the conversion behavior must be homogeneous between all numeric types (
int
,decimal
,float
). For example, the same integer value like 2592000 should not yield different intervals depending on whether it's converted to decimal or float first.both conversion directions must remain consistent (they are today). So for example, converting 2592000 to interval and then back to int should yield the same value.
(note however that we're OK if a starting value in interval form, converted to a numeric type, then back to interval, yields a different interval. That is possible and acceptable because the interval type contains more information than a numeric type, so we can expect data loss.)
Can we achieve this?
duration.FromInt64
/AsInt64
are only used by this cast. So we can change that to achieve alignment without fear of side effects.duration.FromBigInt64
/AsBigInt64
is similar.however,
duration.FromFloat64
/AsFloat64
are odd.These are used in multiple places:
AsFloat64
is used when converting from interval withextract(epoch from <interval>)
(this is a postgres standard feature)percentile_cont
, an aggregate function that reports fractions between input values in an aggregation bucket (also a standard pg feature)FromFloat64
is used to generate the interval values forSHOW LAST QUERY STATISTICS
I think case (4) will benefit from the proposed adjustment, because arguably SHOW LAST QUERY STATISTICS wants an interval expressed in hours/minutes/seconds and doesn't care about days/months.
However, I will recommend making sure we have ample testing for cases (2) and (3), with values that exceed a month, and ensure we don't diverge from postgres for those. (Maybe they are already incompatible with pg, I don't know? If they are, then they will also benefit from the fix.)
Of course, we can't let
duration.FromFloat64
do "something different" fromduration.FromInt64
because then the rules are not homogeneous any more.cc @otan for triage.