RunningJon / TPC-DS

Greenplum TPC-DS benchmark
113 stars 96 forks source link

what's the difference with score and qphds #25

Closed jinmojing closed 3 years ago

jinmojing commented 3 years ago

When run tpcds.sh on our greenplum cluster, we get the result bellow: /data/pivotalguru/TPC-DS/09_score/rollout.sh Scale Factor 3000 Load 1472.137763 Analyze 170.170859 1 User Queries 74.69139 Concurrent Queries 3100.06968 Q 1485 TPT 373.45695 TTT 3100.06968 TLD 73.606888 Score 670

We are confused on score result. Because we don't know the score meaning, this result is good or bad?

For tpc-ds standard , it use QphDS to measure the cluster's performance. How can I convert score to qphds?

RunningJon commented 3 years ago

First off, that isn't a realistic score. I believe you first ran the test with a scale factor of 1 and then re-ran the test with a scale factor of 3000. By default, the variables are set to not regen the data and recreate the tables. If you need to change the scale factor, you need to set RUN_GEN_DATA to "true". After that test run completes, you can change it back to false so that it doesn't recreate the data needed by the test.

I suspect this because 1 User Queries ran in 74 seconds. That is impossible for Greenplum with a size of 3TB of data.

Second, the score is based on the official TPC score calculation but it isn't the same. This test does not perform the update section of TPC-DS. This portion of the test is always skipped for data warehouse products like Greenplum. This score calculation is adjusted for that difference.

The score calculation was developed during my time at Pivotal which is now VMware. To my knowledge, it is still used today.

The score is best used to compare clusters at a macro level. For example, you can run the test in AWS and then in GCP to understand the performance difference between those two environments.

jinmojing commented 3 years ago

Thank u for the ask. I will rerun the test as you suggested.