Closed egasimov closed 2 months ago
After analyzing the source code for the UNNEST operator, we have observed the following pattern need to be used with the UNNEST operator.
{
"queryType": "SQL",
"query": "SELECT d.customer_id, d.purchased_items FROM dfs.root.`/datas3/customers/*` d WHERE EXISTS ( SELECT 1 FROM UNNEST(d.purchased_items) t2(ord) WHERE t2.ord.item_id in (2000001))"
}
@egasimov You beat me to it, but I don't think UNNEST
is the operator you want. UNNEST
performs a LATERAL JOIN automatically, even when not specified. I think FLATTEN
is probably the operator you want to use.
Also, I'd recommend NOT using dot notation to reference inner fields in maps. IE:
Don't use
SELECT s.id
Instead use
SELECT s['id']
Hey @cgivre , Thank you for response. I have really appreciated it.
Requirement for the SQL query is to check and filter the those rows whose nested array field(_purchaseditems) have at least one of the values provided in the IN clause and return the original rows(without duplicating the entries).
When FLATTEN used in the query, after the filter operations, we will again need to GROUP BY or DISTINCT to remove duplicate rows form result.
Wdyt, Is there any other way to accomplish the same goal but in optimized way ? :thinking:
Hello Drill community,
Recently, we have discovered apache drill and started to make some experiments on reading files and run the filter kinda queries over dataset.
So, my scenario is as follows:
I have a parquet file whose contents as follows:(customers with the purchased items) When to try to find the customers who purchased the following items whose product_id is from the given list (777,888)
The following query sent to drill server:
Getting the following error: VALIDATION ERROR: From line 1, column 118 to line 1, column 124: Column 'item_id' not found in table 's'
Additional Context