RunningJon / TPC-DS

Greenplum TPC-DS benchmark
113 stars 96 forks source link

imp option - clarification needed #10

Closed dimon777 closed 7 years ago

dimon777 commented 7 years ago

Does imp option performs test of Impala queries on HAWQ or it performs testing of Impala queries on Impala?

I've run:

./rollout.sh 1 false imp true 5 true true true true true true true true true 1

and see that it initiated tpcds workflow going against HAWQ. I was expecting it to run TPC-DS test against Impala. Please clarify.

I have figured this out: all 3 types of queries are executed against HAWQ, which makes sense. Closing. Thanks.

RunningJon commented 7 years ago

From the readme:

  1. imp: This is the the set of queries created by Cloudera for testing in Impala. The queries are available to provide an apples to apples comparison of run times to compare with Impala. The queries were copied from this public repo: https://github.com/cloudera/impala-tpcds-kit

The license file for these queries is also included: impala_queries_license.txt

The Impala queries were changed for syntax and to remove partition cheating. Impala doesn't support the concatenation with || so they changed the SQL to use their concat() function. This was changed back. Intervals were changed from "interval 30 days" to "'30 days'::interval". Query hints were removed. There are 117 removals of explicit partition pruning in 63 unique queries. Queries 3, 4, 7, 9, 14, 24, 35, 46, and 51 were heavily modified by Cloudera so reverting to the original TPC-DS version.