Kitware / CDash

An open source, web-based software testing server
http://www.cdash.org/
Other
218 stars 76 forks source link

Speed up long-running migrations #2552

Closed stefankaufmann closed 1 week ago

stefankaufmann commented 2 weeks ago

Boost performance for migrations scripts

bilke commented 2 weeks ago

I applied your patch and it seems I got through 2024_07_09_025240_find_test_measurements_by_testid but now got another error in 2024_08_24_160326_label2test_relationship_refactor:

cdash  | Running migrations...
cdash  |
cdash  |    INFO  Running migrations.
cdash  |
cdash  |   2024_07_09_025240_find_test_measurements_by_testid    INFO  No scheduled commands are ready to run.
cdash  |
cdash  | 172.18.0.1 - - [07/Nov/2024:10:23:26 +0000] "GET / HTTP/1.1" 503 382
cdash  | ........... 66,765ms DONE
cdash  |   2024_08_24_160326_label2test_relationship_refactor ............ 7,436ms FAIL
cdash  |
cdash  | In Connection.php line 829:
cdash  |
cdash  |   SQLSTATE[22012]: Division by zero: 7 ERROR:  division by zero (Connection:
cdash  |   pgsql, SQL:
cdash  |                       UPDATE label2test
cdash  |                       SET testid = build2test.id
cdash  |                       FROM build2test
cdash  |                       WHERE
cdash  |                           build2test.buildid = label2test.buildid
cdash  |                           AND build2test.outputid = label2test.outputid
cdash  |                           AND label2test.testid IS NULL
cdash  |                           AND build2test.buildid % 0 = 0
cdash  |                   )
cdash  |
cdash  |
cdash  | In Connection.php line 612:
cdash  |
cdash  |   SQLSTATE[22012]: Division by zero: 7 ERROR:  division by zero
bilke commented 2 weeks ago

I also applied the following change:

diff --git a/database/migrations/2024_08_24_160326_label2test_relationship_refactor.php b/database/migrations/2024_08_24_160326_label2test_relationship_refactor.php
index 67994ad43..64b7745db 100644
--- a/database/migrations/2024_08_24_160326_label2test_relationship_refactor.php
+++ b/database/migrations/2024_08_24_160326_label2test_relationship_refactor.php
@@ -22,7 +22,7 @@ return new class extends Migration {

         // Execute at most 10k batches of buildwise updates
         if (config('database.default') === 'pgsql') {
-            for ($i = 0; $i < ceil($count / 10000); $i++) {
+            for ($i = 1; $i < ceil($count / 10000); $i++) {
                 DB::update('
                     UPDATE label2test
                     SET testid = build2test.id

Then the migrations completed.

stefankaufmann commented 2 weeks ago

For me the label2test table was empty, so we did not run into this problem

@bilke : I am not sure about your fix, but if the count is less than 10000, then the loop will not execute at all, probably it needs to be changed to $i <= ceil($count / 10000) to run at least once

Feedback from Kitware would be nice here. But for me it looks like the current implementation will cause the division by zero problem and should be changed

williamjallen commented 2 weeks ago

It looks like there are two issues here:

  1. The slowness reported in #2545, with a fix proposed in this PR

    • I am not convinced that this is the right solution. Even though it seems to be faster on Postgres, this wasn't found to be substantially faster on MySQL. A better approach would probably to use batching, similar to 2024_08_24_160326_label2test_relationship_refactor.php.
  2. A separate divide-by-zero issue in 2024_08_24_160326_label2test_relationship_refactor.php

    • At first glance, this definitely seems like an issue. It's unclear to me how this succeeded on some production systems. I'll open a PR with a fix.
stefankaufmann commented 2 weeks ago

@williamjallen

I am not convinced that this is the right solution. Even though it seems to be faster on Postgres, this wasn't found to be substantially faster on MySQL. A better approach would probably to use batching, similar to 2024_08_24_160326_label2test_relationship_refactor.php.

I am not sure if batching would really help here. For us 1.5 million entries of build2test are compare to another 1.5 million entries in testoutput. Even in batches this might take very long using NOT IN

Here are just a few references suggesting to replace NOT IN by NOT EXISTS for postgres

If you want to keep the current implementation for MySQL, I could adjust the code thus that NOT EXISTS is only used for postgres

williamjallen commented 2 weeks ago

@stefankaufmann Thanks for this contribution! After playing around with this locally a bit, I didn't see any negative performance impacts of this change when using MySQL, and it clearly seems to help Postgres.

I also created #2555 which fixes the other divide-by-zero error reported above.

Both of these changes will be included in a patch release: CDash 3.6.1.

stefankaufmann commented 1 week ago

@williamjallen Thanks a lot. I really appreciate how quick my pull requests have been processed