Open wschung1113 opened 1 year ago
A good place to find data relevant to query performance is at the query profile which can be found on the Profiles page of the web UI. What format is your data in? What kind of query are you running - grouping, joining, filtering?
Thanks for a reply!
Most of the time is taken at query execution:
Data format is just a ordinary row-based table from a rdbms (Oracle alike).
Query should look like this, rather simple query with an outer join and few group by clauses. However the data is relatively large so it takes about 4 minutes and I want to try to push it down.
WITH DEF_TABLE AS ( SELECT * FROM rdbms.hyperdata_ex.DO171 ) SELECT M_0 FROM ( SELECT L_SHIPINSTRUCT AS C0_0, L_SHIPMODE AS C1_0, SUM(L_EXTENDEDPRICE) AS M_0 FROM DEF_TABLE GROUP BY L_SHIPINSTRUCT, L_SHIPMODE ) TARGET
RIGHT OUTER JOIN ( SELECT * FROM ( SELECT L_SHIPINSTRUCT AS C0_0, L_SHIPINSTRUCT AS K0_0 FROM DEF_TABLE GROUP BY L_SHIPINSTRUCT ), ( SELECT L_SHIPMODE AS C1_0, L_SHIPMODE AS K1_0 FROM DEF_TABLE GROUP BY L_SHIPMODE ) ) BASE ON ( (TARGET.C0_0 = BASE.C0_0) AND (TARGET.C1_0 = BASE.C1_0) ) ORDER BY K0_0 ASC, K1_0 ASC
"Direct Memory Usage" cap is set to 0GB eventhough I configured drill-env.sh...
Not sure why... JVM heap memory looks like is configured via drill-env.sh
Okay.
I'll report back here about the direct memory counter.
An update on the direct memory counter - it's not broken. Can we close this issue now that information about direct memory usage being a percentage of peak usage has been provided? Please feel free to share a JSON query profile on Slack if you'd like to talk more about which operators are responsible for most of the query execution time.
Hello,
I am deploying drillbits as kubernetes pods on my kubernetes cluster. However, my drillbit pod doesn't seem like it's using direct memory for querying quite a large data set (180 million rows, 27 gb of size) as below:
Reading the official documents, I have configured drill-env.sh as such:
Also the pod configurations:
{{.Values.drill.memory}} is set to 4gbs at the moment.
Any clues where I should configure to use up some direct memory and increase performance of query execution?
Thanks for reading!