TPC-Council / HammerDB

HammerDB Database Load Testing and Benchmarking Tool
http://www.hammerdb.com
GNU General Public License v3.0
589 stars 119 forks source link

When running HammerDB against SQL Server, UI freezes and does not keep up with progress #715

Closed Alex-Zarenin closed 3 months ago

Alex-Zarenin commented 3 months ago

Guidance Bug reports are for when HammerDB is not behaving as expected.

Describe the bug In the past releases of HammerDB, when using Autopilot for running test against SQL Server (could be with other platforms as well, but I did not test) the "Autopilot" tab was updated synchronously with creation of VUs. After all VUs were created, the screen was updated with every passing minute of ramp-up time.

With release 4.10, I noticed that the updates of the "Autopilot" tab freezes after creation of a few VUs and the all UI becomes unresponsive up until the end of the ramp-up time, at which point it unfreezes showing the history of ramp-up and becomes responsive again:

Autopilot Sequence 128 91 91 91 128 128 128 181 181 181 256 256 256 362 362 362 started at 00:08:35_07/02/2024 128 Active Virtual User Test started at 00:08:35_07/02/2024 with Monitor VU Beginning rampup time of 10 minutes Rampup 1 minutes complete ... Rampup 2 minutes complete ... Rampup 3 minutes complete ... Rampup 4 minutes complete ... Rampup 5 minutes complete ... Rampup 6 minutes complete ... Rampup 7 minutes complete ... Rampup 8 minutes complete ... Rampup 9 minutes complete ... Rampup 10 minutes complete ... Rampup complete, Taking start Transaction Count. Timing test period of 5 in minutes 1 ...,

To Reproduce Steps to reproduce the behavior:

  1. Configure Driver script options to connect to database
  2. Enable Autopilot with, let's say, 100 VUs
  3. Start Autopilot
  4. Notice that Autopilot tab stops updating after creating Virtual User 1 - Monitor and Virtual User 2 until all ramp-up time ends. Frozen screen

Expected behavior I expect the screen to be updated with creation of every VU and then get updated as every minute of ramp-up time passes.

Screenshots If applicable, add screenshots to help explain your problem.

HammerDB Version (please complete the following information):

HammerDB Interface (please complete the following information):

Operating System (please complete the following information):

Database Server (please complete the following information):

Database Client (please complete the following information):

Additional context Add any other context about the problem here.

sm-shaw commented 3 months ago

Many thanks for the issue, I think I know what is happening here. Firstly, nothing has changed with autopilot at v4.10, and it has been recently tested a lot for v4.11 to add the performance profiles feature https://github.com/TPC-Council/HammerDB/pull/707. To troubleshoot, try running a single test without autopilot first to see if the problem persists so we can rule out autopilot.

If it does, what I think the issue is here is that from the screenshot it looks like use all warehouses is set, and we are printing out 1000s of lines of output due to a high warehouse count. With HammerDB jobs, ALL the output is stored in a SQLite database so every line it is printing is doing a SQLite insert when jobs are enabled - if a number of users are all doing this at the same time and the local disk where hammerdb is installed is not particularly fast then this can become a bottleneck with the issue you see, once every line is inserted and SQLite has caught up then it unfreezes and all works again.

To test this, try disabling jobs as we do for test workloads here https://www.hammerdb.com/docs/ch02s03.html

If it runs without problem, but you do want jobs enabled, as a workaround try commenting out the following line in the driver script in the GUI before you run it:

#puts "VU $myposition : Assigning WID=$wh based on VU count $loadUserCount, Warehouses = $w_id_input ([expr $addMore + 1] out of [ expr int($whRequiredCount)])"
lappend myWarehouses $wh
set addMore [expr $addMore + 1]

If this works, then it looks like we need to consolidate the output of useallwarehouses as this works fine for a smaller number of warehouses, but it looks like a larger number puts too much pressure on the jobs storage.

A proposed solution is we already add $wh to a list for each line and therefore not print it for each line, and then we print all the warehouses added for a particular VU in one line when they are all allocated. This way we get all the same information but only in 1 (longer) line.

So for example, a prposed fixed is as follows:

        } else {
                           # puts "VU $myposition : Assigning WID=$wh based on VU count $loadUserCount, Warehouses = $w_id_input ([expr $addMore + 1] out of [ expr int($whRequiredCount)])"
                            lappend myWarehouses $wh
                            set addMore [expr $addMore + 1]
                        }
                    }
                    set myWhCount [llength $myWarehouses]
                    puts "Assigned $myWhCount WIDs = $myWarehouses based on VU count $loadUserCount, Warehouses = [ expr int($whRequiredCount) ] out of $w_id_input"
                }

Then we get something like this, so the same information but only 1 SQLite insert per VU and should unfreeze the display.

allwarehouse

Alex-Zarenin commented 3 months ago

Hi Steve,

Thank you very much for the instructions on how to address my current issue! I did "Disable jobs" as per the link that you shared and now everything works fine - the screen does not freeze anymore when I run the series of test using Autopilot! I did not test other options as I am not very comfortable commenting out lines in the driver script.

Anyway, disabling jobs helped. My final concer is that the link states: "If you leave Jobs enabled it is expected that the performance of the test workload is slower as all of the output is stored in SQLite." - does it mean that the performance results I got in my previous test with "Jobs Enabled" cannot now be directly compared to my new results with "Jobs Disabled"?

Thank you, --Alex

sm-shaw commented 3 months ago

No, you don't have to worry about the performance being slower. This is the reason why the HammerDB workloads suppresses the output for a timed workload. Also note that with both the CLI and GUI all output is passed to the main thread to be printed so there are single-threaded points with both printing and recording the output. Again, this is why when you are running a test the timed workload doesn't print anything out. HammerDB has been designed this way to be completely multi-threaded e.g. https://www.hammerdb.com/blog/uncategorized/why-tcl-is-700-faster-than-python-for-database-benchmarking/

It is only in this case where the number of warehouses is very high and using the useallwarehouses option that we have a lot of output at the start during rampup that slows things down when it prints the messages for which warehouses are being used. As above we just need to limit this output so it doesn't become a bottleneck.

sm-shaw commented 3 months ago

PR #717 adds the fix described in this issue and has been tested on PostgreSQL and MariaDB. Use all warehouses option is common to all databases, so functionality will be available to SQL Server as per this issue.