datafusion-contrib / datafusion-dft

A batteries included data processing and DataFusion development app for the terminal
Apache License 2.0
114 stars 8 forks source link

Implement `select * from 'foo.parquet'` support #125

Open alamb opened 2 months ago

alamb commented 2 months ago

I want queries like this to work:

select * from '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits.parquet

This works great in datafusion-cli:

datafusion-cli -c "select * from '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits.parquet' limit 10"
DataFusion CLI v41.0.0
+---------------------+------------+---------------------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+--------------------+--------------+----+-----------+------------------------------------------------+--------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+-------------+------------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+----------------------+------+
| WatchID             | JavaEnable | Title                                                                                       | GoodEvent | EventTime  | EventDate | CounterID | ClientIP   | RegionID | UserID             | CounterClass | OS | UserAgent | URL                                            | Referer                              | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersion4 | PageCharset | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID | OriginalURL | HID        | IsOldCounter | IsEvent | IsParameter | DontCountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP   | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash          | URLHash              | CLID |
+---------------------+------------+---------------------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+--------------------+--------------+----+-----------+------------------------------------------------+--------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+-------------+------------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+----------------------+------+
| 8993561131065960301 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373344372 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373331008      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 551752182  | 0            | 0       | 0           | 0             | 0        | 5        | 1373318120     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 6635318252456702850 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373344834 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | 4�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373331459      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 490305020  | 0            | 0       | 0           | 0             | 0        | 5        | 1373318410     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 6072156798303264550 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373345083 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373331642      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 261581801  | 0            | 0       | 0           | 0             | 0        | 5        | 1373318651     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 6676230879155791167 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373345868 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373332322      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 124030311  | 0            | 0       | 0           | 0             | 0        | 5        | 1373319379     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 5817482884949783786 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373346325 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373332643      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 70501440   | 0            | 0       | 0           | 0             | 0        | 5        | 1373319704     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | h1             |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 9086578174639119203 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373346328 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznodolskiy.irr.ru/r/lyrics/snastree | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373332646      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 103556130  | 0            | 0       | 0           | 0             | 0        | 5        | 1373319706     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | 5420047383373475333  | 0    |
| 8538774668558945189 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373346894 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-temno-bolshaybachkala | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | 4�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373333201      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 489021314  | 0            | 0       | 0           | 0             | 0        | 5        | 1373320366     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | 5530112846577889149  | -4713643203787022327 | 0    |
| 6450996617047587792 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373346895 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznodolskiy.irr.ru/r/lyrics/snastree | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373333202      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 1062047245 | 0            | 0       | 0           | 0             | 0        | 5        | 1373320367     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | 5420047383373475333  | 0    |
| 8447568139699209362 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373347543 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-temno-bolshaybachkala | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373334118      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 916502768  | 0            | 0       | 0           | 0             | 0        | 5        | 1373321060     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | 5530112846577889149  | -4713643203787022327 | 0    |
| 8718490359481193433 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373347546 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznodolskiy.irr.ru/r/lyrics/snastree | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373334120      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 506841884  | 0            | 0       | 0           | 0             | 0        | 5        | 1373321062     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | h1             |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | 5420047383373475333  | 0    |
+---------------------+------------+---------------------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+--------------------+--------------+----+-----------+------------------------------------------------+--------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+-------------+------------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+----------------------+------+
10 row(s) fetched.
Elapsed 0.239 seconds.

It currently doesn't in dft

$ dft -c "select * from '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits.parquet'"
Error: Error during planning: table 'datafusion.public./Users/andrewlamb/Software/datafusion/benchmarks/data/hits.parquet' not found

Location:
    /Users/andrewlamb/Software/datafusion-tui/src/app/execution.rs:127:20

Once datafusion 42.0.0 is released, we can likely use the DynamicFileProvider that @goldmedal added in https://github.com/apache/datafusion/pull/11035

matthewmturner commented 2 months ago

It would be really cool if you could inspect parquet metadata with this

devanbenz commented 2 months ago

This seems like a really fun thing to work on. If no one else is working on it I'd love to take it on. I really like the metadata idea.

matthewmturner commented 2 months ago

Contributions very welcome :)

IMO it would be best to first integrate the DynamicFileProvider in one PR and then do a follow on for metadata.

I also just realized that their may be some existing metadata functionality for datafusion although its not clear to me from the docs whether that is only in datafusion-cli or a built-in function to datafusion that we could also use. If it was specific to datafusion cli then it would be great if we could add that.

matthewmturner commented 2 months ago

Also you could potentially get some inspiration for additional metadata capabilities from duckdb

alamb commented 2 months ago

BTW we can take inspiration from / copy outright the parquet_metadata table function for parquet from datafusion-cli (also modeled on duckdb): https://datafusion.apache.org/user-guide/cli/usage.html#parquet-metadata

I would like to suggest creating those functions in their own crate (perhaps datafusion-functions-parquet?) -- it could be in the datafusion-dft repo initially for convenience, but I think eventually the goal should be that dft just be focused on integration rather than actually implementing such features.

In fact maybe once dft gets good enough we could remove the parquet_metadata function from datafusion-cli entirely 🤔

alamb commented 2 months ago

I also just realized that their may be some existing metadata functionality for datafusion although its not clear to me from the docs whether that is only in datafusion-cli or a built-in function to datafusion that we could also use. If it was specific to datafusion cli then it would be great if we could add that.

Sorry I missed this -- it is only in datafusion-cli

Implementation is here: https://github.com/apache/datafusion/blob/257e1409eca81cfff024ecc5e2567e9f67e6b5a3/datafusion-cli/src/functions.rs#L317-L459

alamb commented 2 months ago

I suggest we file a second ticket for implementing parquet_metadata and other duckdb metadata functions

matthewmturner commented 2 months ago

i agree with putting it in its own crate. like @alamb said i also think that dft could be used as an incubator of sorts. For example I have taken that approach in my WASM function factory PR. I have no intention of keeping that in this repo but its quite convenient for the time while it matures.

devanbenz commented 2 months ago

i agree with putting it in its own crate. like @alamb said i also think that dft could be used as an incubator of sorts. For example I have taken that approach in my WASM function factory PR. I have no intention of keeping that in this repo but its quite convenient for the time while it matures.

Sounds good -- I'll go ahead and get these both assigned to myself and then start cracking on it in the next few days :)

devanbenz commented 2 months ago

take

edit: looks like github actions is not set up to auto-assign like datafusion 😅

alamb commented 1 month ago

take

edit: looks like github actions is not set up to auto-assign like datafusion 😅

How about you take https://github.com/datafusion-contrib/datafusion-dft/issues/148 and I'll try this one? DataFusion 42 was just released and I very much want this particular feature in dft