Open l1t1 opened 3 months ago
I think we could add this as a user defined table function in datafusion-cli quite easily
For example we could follow the model of parquet_metadata
(which also follows duckdb):
SELECT path_in_schema, row_group_id, row_group_num_rows, stats_min, stats_max, total_compressed_size
FROM parquet_metadata('hits.parquet')
WHERE path_in_schema = '"WatchID"'
LIMIT 3;
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
| path_in_schema | row_group_id | row_group_num_rows | stats_min | stats_max | total_compressed_size |
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
| "WatchID" | 0 | 450560 | 4611687214012840539 | 9223369186199968220 | 3883759 |
| "WatchID" | 1 | 612174 | 4611689135232456464 | 9223371478009085789 | 5176803 |
| "WatchID" | 2 | 344064 | 4611692774829951781 | 9223363791697310021 | 3031680 |
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
3 rows in set. Query took 0.053 seconds.
It is documented here https://arrow.apache.org/datafusion/user-guide/cli.html
The code for it is here; https://github.com/apache/arrow-datafusion/blob/637293580db0634a4efbd3f52e4700992ee3080d/datafusion-cli/src/functions.rs#L215-L442
Is your feature request related to a problem or challenge?
it is complex and slowly to use unnest().
Describe the solution you'd like
it can run
select from generate_series()
as duckdb does.Describe alternatives you've considered
No response
Additional context