TPC-Council / HammerDB

HammerDB Database Load Testing and Benchmarking Tool
http://www.hammerdb.com
GNU General Public License v3.0
542 stars 115 forks source link

Add database consistency checks #659

Closed sm-shaw closed 5 months ago

sm-shaw commented 5 months ago

Is your feature request related to a problem? Please describe. The TPC-C and TPC-H specifications that HammerDB is derived from includes schema consistency checks e.g. for TPROC-C there are 12 checks of which the first 4 should be demonstrated. HammerDB doesn't currently implement any consistency checks. This would be useful for HammerDB enhancements and also testing database modifications.

Describe the solution you'd like Implement consistency checks for both TPROC-C and TPROC-H for all databases. In the GUI this will be an option under Schema, after Build and before Delete. In the CLI this will be an additional command such as "checkconsistent" that will need to be added to both Tcl and Python CLI interfaces.

Describe alternatives you've considered

Additional context In the provided scripts run the consistency check after the run phase and before the results are reported and schema deleted.

sm-shaw commented 5 months ago

As example first commit for SQL Server TPROC-C https://github.com/TPC-Council/HammerDB/compare/master...sm-shaw:HammerDB:659 implements the following checks:

Check 1 Database Exist

Check 2 Tables Exist

Check 3 Warehouse count in schema is the same as dict setting

Check 4 Tables are indexed

Check 5 Tables are populated

Check 6 Stored Procedures Exist

Consistency check 1

Consistency check 2

Consistency check 3

Consistency check 4

check
sm-shaw commented 5 months ago

For reference the consistency checks implemented for TPROC-C and TPROC-H are as follows and all have been tested on all databases to ensure that the schema is consistent both after a load and running a test. (For TPROC-H after running a refresh function). These come after basic tests to make sure the schema has been built as expected such as tables exists, indexes exist, stored procedures exist etc to prevent consistency tests failing on an incomplete build.

TPROC-C The specification defines 12 consistency conditions, of which Consistency conditions 1 through 4 are:

  1. The sum of balances (d_ytd) for all Districts within a specific Warehouse is equal to the balance (w_ytd) of that Warehouse.
  2. For each District within a Warehouse, the next available Order ID (d_next_o_id) minus one is equal to the most recent Order ID [max(o_id)] for the ORDER table associated with the preceding District and Warehouse. Additionally, that same relationship exists for the most recent Order ID [max(o_id)] for the NEW-ORDER table associated with the same District and Warehouse. Those relationships can be illustrated as: d_next_o_id – 1 = max(o_id) = max(no_o_id) where (d_w_id = o_w_id = no_w_id) and (d_id = o_d_id = no_d_id)
  3. For each District within a Warehouse, the value of the most recent Order ID [max(no_o_id)] minus the first Order ID [min(no_o_id)] plus one, for the NEW-ORDER table associated with the District and Warehouse, equals the number of rows in that NEW-ORDER table. That relationship can be illustrated as: max(no_o_id) – min(no_o_id) + 1 = rows in NEW-ORDER where (o_w_id = no_w_id) and (o_d_id = no_d_id)
  4. For each District within a Warehouse, the sum of Order-Line counts [sum(o_ol_cnt)] for the Orders associated with the District equals the number of rows in the ORDER-LINE table associated with the same District. That relationship can be illustrated as: sum(o_ol_cnt) = rows in the ORDER-LINE table for the Warehouse and District

TPROC-H A consistent state for the TPC-H database is defined to exist when: O_TOTALPRICE = SUM(L_EXTENDEDPRICE(1-L_DISCOUNT)(1+L_TAX) for each ORDER and LINEITEM defined by (O_ORDERKEY=L_ORDERKEY) and can be checked by: SELECT DECIMAL(SUM(DECIMAL(INTEGER(INTEGER(DECIMAL (INTEGER(100DECIMAL(L_EXTENDEDPRICE,20,3)),20,3) (1-L_DISCOUNT)) * (1+L_TAX)),20,3)/100.0),20,3) FROM TPCD.LINEITEM WHERE L_ORDERKEY = okey

SELECT DECIMAL(SUM(O_TOTALPRICE, 20, 3)) from TPCD.ORDERS WHERE O_ORDERKEY = okey