Open mustafasrepo opened 4 months ago
I can do this one
Hello @mustafasrepo could you give me the actual table creation SQL for this issue?
Hello @mustafasrepo could you give me the actual table creation SQL for this issue?
Sure,
CREATE EXTERNAL TABLE multiple_ordered_table (
a0 INTEGER,
a INTEGER,
b INTEGER,
c INTEGER,
d INTEGER
)
STORED AS CSV
WITH HEADER ROW
WITH ORDER (a ASC, b ASC)
WITH ORDER (c ASC)
LOCATION '../core/tests/data/window_2.csv';
you can create the table in the queries with above snippet
take
As @Lordworms points out, maybe we can try to implement this feature in some way that is general and not special cased in the optimzer -- aka https://github.com/apache/arrow-datafusion/issues/9289
This would look like
ARRAY_AGG
an AggregateUDF
(which probably would mean making a datafusion-aggregates
crateThis would certainly take more work and thus more time than just implementing a special case for the BuiltInAggregateFunction, so I don't think it is necessary
However, if we think this is a reasonbale approach I can file some tickets with the basic ideas sketeched out (I didn't want to sketch out too many things at once and we already have a bunch of work related to pulling out scalar UDFs)
Yes, I truely want to add some general approach instead of writing some "if else", but I agree that currently we have more ticket than that, I could do this one later and do some current-ticket...
Is your feature request related to a problem or challenge?
Query below
and
produces same results. However, first query generates following plan
whereas second query generates following plan
Describe the solution you'd like
we can rewrite first query as second one, which executes faster with less memory. Because it no longer needs to keep all results in the array_agg.
Describe alternatives you've considered
No response
Additional context
No response