Optimizing comdb2 for running TPCC (OLTP) benchmark

Blade-Lee commented 6 years ago

Hello, currently we are runnig comdb2 on AWS and we are trying to figure out a way of optimizing the performance of the comdb2 running on TPCC (OLTP) benchmark. Our goal is to complete as many transactions as possible within 300 seconds.

Workload

We are using TRANSACTION_READ_COMMITTED isolation. We are limited to use no more than 15 worker threads (otherwise there will be errors on our end, not comdb2 end).
The scale factor is 16 (Scale factor is the number of warehouses in TPCC).
Each worker is able to send another transaction immediately after one transaction completes.
Total Time for testing: 300 seconds.
Percent of transactions within above 300 seconds:
1. 45% NewOrder
2. 43% Payment
3. 4% OrderStatus
4. 4% Delivery
5. 4% StockLevel

Cluster Config

One instance of c4.4xlarge, thus there is only one node in the cluster.

Initial thoughts

Here are some of our initial thoughts:

increase cache size (64 mb -> 128 mb)
increase # of threads
increase log-sync-time
enable init_with_bthash

May I ask could you please provide some suggestions/hints on 1) what configs should we optimize 2) what value range of a specific config should we conduct experiment on ?

Thank you very much!

markhannum commented 6 years ago

Hi Blade-Lee- I'll address your initial thoughts first:

Increasing the cache size is always a great idea. The rule of thumb is that if you set it to the maximum size of the data footprint that you expect, you'll prevent paging to disk inline with requests.
Increasing the number of write threads on the database side is a win if concurrent client requests do not overlap the same pages. However, we've seen plenty workloads that are improved by decreasing the number of write threads (the maxwt tunable) from the default value to 4 or even 2. Most of the reason for this is the additional overhead of retrying write transactions on deadlock. Incidentally, we have a project in the works which will eliminate deadlocks entirely, and thus allow you to run with more writer threads. I expect to see this in production in early to mid 2018.
Yes- absolutely, but know that there's a tradeoff here: if the database crashes before a log is sync'd to disk, you will lose commits. You are much more vulnerable to this in single-node mode than you would be if you were running a cluster.
The bthash is a way to avoid a btree traversal on the data-file pages by keeping the record-genid -> page mappings in a hash. It makes things faster, but on our machines we've found that the difference is negligible for a well cached database.

As for configs to optimize, you'll want to enable statement caching for all queries ("enable_sql_stmt_caching all"), and make sure that compression is enabled for every table. Usually lz4 is the best, but you can use the "testcompr" message-trap to compare our different compression against a table. So issue a "exec procedure sys.cmd.send('testcompr table ')". The defaults should be okay otherwise.

Thank you for using comdb2! We are very interested in your feedback!

akshatsikarwar commented 6 years ago

You should also run optimized build. Configure with: cmake -DCMAKE_BUILD_TYPE=Release

saatviks commented 6 years ago

A few quick followups:

Running the testcompr procedure shows that different compression schemes work better for different tpcc tables. How do we set table-wise compression levels - Couldnt find anything about this in the guide on the ComDB2 page.
I'm guessing init_with_bthash 1 should be sufficient for bthash.

akshatsikarwar commented 6 years ago

Setting compression per table is easy. Before we do, can you post the output from testcompr? Compression is enabled by default and defaults might serve well anyway.

saatviks commented 6 years ago

As an example:

For the table Customer: cdb2sql tpcc "exec procedure sys.cmd.send('testcompr table customer')"

Percentage of original size for: customer
Using CRLE: Data: 83.14% Blobs 100.00%
enable_sql_stmt_caching all
Using RLE8: Data: 83.62% Blobs 100.00%
Using zlib: Data: 64.33% Blobs 100.00%
Using  LZ4: Data: 85.50% Blobs 100.00%
(out='>Percentage of original size for: customer')
(out='>Using CRLE: Data: 83.14% Blobs 100.00%')
(out='>Using RLE8: Data: 83.62% Blobs 100.00%')
(out='>Using zlib: Data: 64.33% Blobs 100.00%')
(out='>Using  LZ4: Data: 85.50% Blobs 100.00%')
(out='Current tunables:')
(out='Compress 10% of the records, upto a max of 300000')

Here LZ4 performs best.

For table history: cdb2sql tpcc "exec procedure sys.cmd.send('testcompr table history')"

Percentage of original size for: history
Using CRLE: Data: 82.22% Blobs 100.00%
Using RLE8: Data: 88.59% Blobs 100.00%
Using zlib: Data: 77.36% Blobs 100.00%
Using  LZ4: Data: 84.00% Blobs 100.00%
(out='>Percentage of original size for: history')
(out='>Using CRLE: Data: 82.22% Blobs 100.00%')
(out='>Using RLE8: Data: 88.59% Blobs 100.00%')
(out='>Using zlib: Data: 77.36% Blobs 100.00%')
(out='>Using  LZ4: Data: 84.00% Blobs 100.00%')
(out='Current tunables:')
(out='Compress 10% of the records, upto a max of 300000')

Here RLE8 performs best.

akshatsikarwar commented 6 years ago

Also, lets confirm compression is already enabled: sys.cmd.send('stat compr')

akshatsikarwar commented 6 years ago

zlib does compress better but uses a lot more CPU. I would just go with defaults.

saatviks commented 6 years ago

Oops - Ive written incorrectly above I guess. zlib performs best in both.

akshatsikarwar commented 6 years ago

Use CRLE :)

akshatsikarwar commented 6 years ago

We are limited to use no more than 15 worker threads (otherwise there will be errors on our end, not comdb2 end).

Is that because of selectv or because the workers update/delete the same rows?

saatviks commented 6 years ago

It seems to be because of workers update/deleting the same rows - the procedure which dont use any SelectV are also throwing errors.

akshatsikarwar commented 6 years ago

Thanks. Can you explain when do you use selectv?

saatviks commented 6 years ago

The NewOrder procedure in TPCC(link) uses For Update statements, which I've changed to SELECTV in the dialects file

saatviks commented 6 years ago

This is my current comdb2.lrl file:

enable_sql_stmt_caching all
init_with_bthash 1
log-sync-time 50
cache 1 gb
maxwt 4

I'm getting the following warning/error: Non-registered tunable 'cache'. and Non-registered tunable 'log-sync-time'. [ERROR] unknown opcode 'log-sync-time' in lrl /home/ubuntu/comdb2/etc/cdb2/config/comdb2.lrl.

akshatsikarwar commented 6 years ago

sync log-sync-time 60 Rest are fine (warning from work in progress to consolidate various tunables). I wouldn't change maxwt unless you are getting better numbers with lower.

saatviks commented 6 years ago

So I dont need to worry about the warnings: Non-registered tunable 'cache'., Non-registered tunable 'sync'.?

Blade-Lee commented 6 years ago

As a followup of of saatvik's reply, we tried the configuration below (in comdb2.lrl):

enable_sql_stmt_caching all
init_with_bthash 1
sync log-sync-time 60
cache 1 gb
maxwt 4

But still get warnings: Non-registered tunable 'cache' and Non-registered tunable 'sync'

riverszhang89 commented 6 years ago

You don't need to worry about those warnings - those settings do take effect.

saatviks commented 6 years ago

With the given configuration settings, after creating a database and when attempting to create tables, we are getting a Segfault.

riverszhang89 commented 6 years ago

I can reproduce the crash. Seems that DDL does not work with enable_sql_stmt_caching 😟 .

saatviks commented 6 years ago

Should we disable that then to produce our benchmarks - or wait for a fix?

riverszhang89 commented 6 years ago

You can disable it.

adizaimi commented 6 years ago

enable_sql_stmt_caching should be disabled until we fix it -- https://github.com/bloomberg/comdb2/pull/514 will fix most issues I encountered, and possibly the one you saw above.

riverszhang89 commented 6 years ago

Adi can you test if DDL works in your patch? It should be real quick - right now on master, a simple CREATE TABLE would crash the database if enable_sql_stmt_caching is set to ALL.

fire commented 6 years ago

Any updates on removing the write locks? It seems like the database writes have a write limit of 4 threads regardless of the number of machines.

bloomberg / comdb2