Open solongordon opened 7 years ago
In general, I think we could fix this and other similar situations by introducing a way to normalize scalar builtin functions and their arguments.
We could have a way of defining a rule that looks at a scalar builtin and lets you normalize its arguments. But I'm not exactly sure whether this is going to be worth doing - how many circumstances of this nature really are there?
It turns out that TPCH query 7 actually uses extract
, so I'm going to run a test with that query and see if the extra allocations in this issue actually make a difference or not.
cc @otan @knz - why were we planning to remove extract_duration
in 20.2? I see a TODO to that effect that I don't understand.
Using the simpler query
select extract('YeAr' from l_commitdate) as f from lineitem group by f;
we see about 30% of the program's allocations in strings.ToLower :) Though we also see 30% just returning the new float that represents the year, so it's not totally clear that this is such a big deal still.
Still, it really would be nice to have a way to normalize the input arguments to builtins before running them once per every row... it feels like we should be able to somehow define that this function always lower()
's its first argument via the optimizer. Then, since lower()
is an immutable function, it'll always get constant folded via the pre-existing constant folding rules 🤔
@cockroachdb/sql-optimizer any thoughts on this? It's not urgent but just curious.
Since extract
is super weird, and completely handled by the parser, we could theoretically just tell the parser to add a lower()
to the first argument unconditionally, but that seems inflexible, because there are other functions like date_trunc
that have this same property that don't get special handling via the parser.
We could have a crdb_internal_extract_from_lower
variant and a normalization rule that converts extract(x)
to crdb_internal_extract_from_lower(lower(x))
.
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
This is useful for extract
since there are some test cases with uppercase arguments. I'll implement this later and provide a performance report.
More general idea:
I'm also exploring the possibilities of making this a general improvement for all built-in functions. The idea is to fold constants during optimization. However, it seems complicated the issue:
extract('year', ...)
to extract(lower('year'), ...)
, it will result in significantly more CPU instructions than the original function, which is not good.extract
as lower
, it increases the code complexity a lot with little performance improvement.Issues with the function extract
:
By the way, I found that there are many test cases with unquoted arguments:
SELECT extract(hour from timetz '12:01:02.345678-07')
, where "hour" is a constant.SELECT extract(element, input::timestamp)
, where "element" is a column.So what are the rules for argument resolution? What if there is a column named "hour"?
See discussion at https://reviewable.io/reviews/cockroachdb/cockroach/19923#-KyS_gHiEzAfKPa0fg4X:-Ky_Yc13IQxe498kphDh:b708j2.
In our handling for the
extract
built-in, we do string normalization on the precision argument. @justinj and @knz pointed out that there are some unnecessary string allocations here which could be reduced.Also applies to
extract_duration
anddate_trunc
.Jira issue: CRDB-5946