TPC-Council / HammerDB

HammerDB Database Load Testing and Benchmarking Tool
http://www.hammerdb.com
GNU General Public License v3.0
545 stars 115 forks source link

Table partitioning in OLAP - TPROC-H workload #605

Closed Ora-DBA closed 10 months ago

Ora-DBA commented 10 months ago

Hi, I'm trying to build schema for data warehouse OLAP benchmarking using HammerDB. Being archival data, test database size would be around 1 TB. Whild doing so, in HammerDB, I'm not able to locate table partitioning option for TPROC-H workload. It is available in OLTP TPROC-C workload. Is this intentionally kept disabled or config changes would be required? please help!!

Environment details as below. HammerDB Version :- Version: 4.8 HammerDB Interface :- GUI Operating System: Server OS: Oracle Linux 8.6 Client OS: Windows 2019 Database Server : Database name: Oracle Database Release Version: 19.3 **Database Client : 19c

Thanks!

sm-shaw commented 10 months ago

Partitioning has not been implemented for the TPROC-H schema by default for any database as yet. Originally this was due to the additional Oracle licensing required for the option when first implemented.

We did recently look into partitioning TPROC-H recently, starting with PostgreSQL, however foreign key related issues with PostgreSQL meant that the POC with the partitioned schema was slower rather than faster.

It is certainly a viable option for other databases such as Oracle and if you wish to implement it and submit a pull request to add the feature you are welcome to do so.

If you wish to add partitioning for your own schema then it is straightforward (partition LINEITEM and ORDERS on L_SHIPDATE, L_ORDERDATE) and you will need to implement it manually by modifying the default schema.

Also note that partitioning only benefits traditional row-oriented parallel query and that all the supported databases (including Oracle) have columnstores that are designed to support workloads such as TPROC-H. These columnstores do not need partitioning and therefore implementing it recently has not been a priority given that many people prefer the columnstore approach for the TPROC-H workload.

However, if you do wish to submit a pull request to add a TPROC-H partitioning option for any database (It is likely that only Oracle, SQL Server and Db2 would benefit at present) it would be welcome and would not wish to deter you from contributing such a feature.

Ora-DBA commented 10 months ago

Thanks Steve for clarifying. I tried for partition table LINEITEM with small size and executed select query just on LINEITEM, time for query execution was more than non-partitioned table as you said.

sm-shaw commented 10 months ago

Hi, partitioning when correctly implemented will benefit the TPROC-H queries selecting by date. They use partition pruning to prevent scanning partitions they know do not contain the date range in the query because the data has already been partitioned by date. When correctly implemented on a database with full partitioning support (such as Oracle) then this can and does provide better performance for the TPROC-H queries with a row-oriented/parallel query based workload. As noted using a columnstore does not need partitioning.