DOMjudge / domjudge

DOMjudge programming contest jury system
https://www.domjudge.org
GNU General Public License v2.0
712 stars 250 forks source link

DOMjudge has been running a very slow SQL #1965

Closed Dup4 closed 5 months ago

Dup4 commented 1 year ago

Description of the problem

Replace this line with a short description.

The day after a contest, we suddenly found out that DOMjudge could not be accessed.

Initially, we found that the database load was very high, using up almost all the cpu cores on the machine.

image

By troubleshooting the database's slow SQL, we found that DOMjudge had been running a SQL statement like this

SELECT DISTINCT 
j0_.judgingid AS judgingid_0, 
j0_.starttime AS starttime_1, 
j0_.endtime AS endtime_2, 
j0_.result AS result_3, 
j0_.verified AS verified_4, 
j0_.jury_member AS jury_member_5, 
j0_.verify_comment AS verify_comment_6, 
j0_.valid AS valid_7, 
j0_.output_compile AS output_compile_8, 
j0_.metadata AS metadata_9, 
j0_.seen AS seen_10, 
j0_.judge_completely AS judge_completely_11, 
j0_.uuid AS uuid_12, 
j0_.cid AS cid_13, 
j0_.submitid AS submitid_14, 
j0_.rejudgingid AS rejudgingid_15, 
j0_.prevjudgingid AS prevjudgingid_16, 
j0_.errorid AS errorid_17 
FROM judging j0_ 
LEFT JOIN rejudging r1_ ON j0_.rejudgingid = r1_.rejudgingid 
INNER JOIN judging_run j2_ ON j0_.judgingid = j2_.judgingid 
INNER JOIN judgetask j3_ ON j2_.judgetaskid = j3_.judgetaskid 
INNER JOIN judgehost j4_ ON j3_.judgehostid = j4_.judgehostid 
WHERE j4_.hostname = 'judgedaemon-802-2' 
AND j0_.judgingid = j3_.jobid 
AND j2_.runresult IS NULL 
AND (j0_.valid = 1 OR r1_.valid = 1) 
AND j0_.result <> 'compiler-error';
image

I tried running this SQL and it was very slow.

show full processlist;

By running the above command on MariaDB, we do see that a similar SQL statement is being run all the time, which I guess is the reason for the high load on the DB.

image

So what can I do about this situation.

Your environment

Include details about your installation here.

  • DOMjudge version (e.g. 7.0.0 or a github commit hash)
  • Operating system / Linux distribution and version (e.g. Ubuntu 18.04)
  • Webserver (e.g. Apache or nginx)

Steps to reproduce

Replace this with a description how we can reproduce your bug.

  • Step 1
  • Step 2
  • Step 3

Expected behaviour

Replace this line with what you would expect to happen.

Actual behaviour

Replace this line with what happens instead.

Any other information that you want to share?

Please include webserver, symfony and judgedaemon log snippets here as appropriate. Screenshots may help in case of UI bugs.

nickygerritsen commented 1 year ago

Seems to be this query: https://github.com/domjudge/domjudge/blob/main/webapp/src/Controller/API/JudgehostController.php#L176-L192

nickygerritsen commented 1 year ago

Doing an explain gives:

1   SIMPLE  j4_ const   PRIMARY,hostname    hostname    258 const   1   Using index; Using temporary
1   SIMPLE  j2_ ALL testcaseid,judgingid,IDX_29A6E6E13CBA64F2               92  Using where
1   SIMPLE  j0_ eq_ref  PRIMARY PRIMARY 4   domjudge.j2_.judgingid  1   Using where
1   SIMPLE  r1_ eq_ref  PRIMARY PRIMARY 4   domjudge.j0_.rejudgingid    1   Using where; Distinct
1   SIMPLE  j3_ eq_ref  PRIMARY,jobid,judgehostid,judgehostid_jobid,judgehostid_valid_priority,specific_type    PRIMARY 4   domjudge.j2_.judgetaskid    1   Using where; Distinct

Dropping the DISTINCT would give:

1   SIMPLE  j4_ const   PRIMARY,hostname    hostname    258 const   1   Using index
1   SIMPLE  j2_ ALL testcaseid,judgingid,IDX_29A6E6E13CBA64F2               92  Using where
1   SIMPLE  j0_ eq_ref  PRIMARY PRIMARY 4   domjudge.j2_.judgingid  1   Using where
1   SIMPLE  r1_ eq_ref  PRIMARY PRIMARY 4   domjudge.j0_.rejudgingid    1   Using where
1   SIMPLE  j3_ eq_ref  PRIMARY,jobid,judgehostid,judgehostid_jobid,judgehostid_valid_priority,specific_type    PRIMARY 4   domjudge.j2_.judgetaskid    1   Using where

I'm wondering if we would do the distinct check in PHP whether that would make it faster. @Dup4 could you try running the query without the DISTINCT on your database and see how fast it is?

Dup4 commented 1 year ago

Doing an explain gives:

1 SIMPLE  j4_ const   PRIMARY,hostname    hostname    258 const   1   Using index; Using temporary
1 SIMPLE  j2_ ALL testcaseid,judgingid,IDX_29A6E6E13CBA64F2               92  Using where
1 SIMPLE  j0_ eq_ref  PRIMARY PRIMARY 4   domjudge.j2_.judgingid  1   Using where
1 SIMPLE  r1_ eq_ref  PRIMARY PRIMARY 4   domjudge.j0_.rejudgingid    1   Using where; Distinct
1 SIMPLE  j3_ eq_ref  PRIMARY,jobid,judgehostid,judgehostid_jobid,judgehostid_valid_priority,specific_type    PRIMARY 4   domjudge.j2_.judgetaskid    1   Using where; Distinct

Dropping the DISTINCT would give:

1 SIMPLE  j4_ const   PRIMARY,hostname    hostname    258 const   1   Using index
1 SIMPLE  j2_ ALL testcaseid,judgingid,IDX_29A6E6E13CBA64F2               92  Using where
1 SIMPLE  j0_ eq_ref  PRIMARY PRIMARY 4   domjudge.j2_.judgingid  1   Using where
1 SIMPLE  r1_ eq_ref  PRIMARY PRIMARY 4   domjudge.j0_.rejudgingid    1   Using where
1 SIMPLE  j3_ eq_ref  PRIMARY,jobid,judgehostid,judgehostid_jobid,judgehostid_valid_priority,specific_type    PRIMARY 4   domjudge.j2_.judgetaskid    1   Using where

I'm wondering if we would do the distinct check in PHP whether that would make it faster. @Dup4 could you try running the query without the DISTINCT on your database and see how fast it is?

image
nickygerritsen commented 1 year ago

So still slow but less slow? How many judgings, rejudgings, judging_runs and judgetasks do you have?

And could you try the same query but then twice:

Dup4 commented 1 year ago

So still slow but less slow? How many judgings, rejudgings, judging_runs and judgetasks do you have?

And could you try the same query but then twice:

  • Once with only j0_.valid =1
  • Once with only r1_.valid = 1 ? So that we get rid of the OR
image image image
Dup4 commented 1 year ago

I've checked again today and it may not be a SQL statement problem.

image

I found out through EXPLAIN that the submission table is not indexed.

image image image image

Then by SHOW INDEX I suddenly found that many tables in the database have a Cardinality of 0 in their indexes.

When I executed ANALYZE TABLE judgeing, the following SQL statement was executed very fast.

SELECT DISTINCT 
j0_.judgingid AS judgingid_0, 
j0_.starttime AS starttime_1, 
j0_.endtime AS endtime_2, 
j0_.result AS result_3, 
j0_.verified AS verified_4, 
j0_.jury_member AS jury_member_5, 
j0_.verify_comment AS verify_comment_6, 
j0_.valid AS valid_7, 
j0_.output_compile AS output_compile_8, 
j0_.metadata AS metadata_9, 
j0_.seen AS seen_10, 
j0_.judge_completely AS judge_completely_11, 
j0_.uuid AS uuid_12, 
j0_.cid AS cid_13, 
j0_.submitid AS submitid_14, 
j0_.rejudgingid AS rejudgingid_15, 
j0_.prevjudgingid AS prevjudgingid_16, 
j0_.errorid AS errorid_17 
FROM judging j0_ 
LEFT JOIN rejudging r1_ ON j0_.rejudgingid = r1_.rejudgingid 
INNER JOIN judging_run j2_ ON j0_.judgingid = j2_.judgingid 
INNER JOIN judgetask j3_ ON j2_.judgetaskid = j3_.judgetaskid 
INNER JOIN judgehost j4_ ON j3_.judgehostid = j4_.judgehostid 
WHERE j4_.hostname = 'judgedaemon-802-2' 
AND j0_.judgingid = j3_.jobid 
AND j2_.runresult IS NULL 
AND (j0_.valid = 1 OR r1_.valid = 1) 
AND j0_.result <> 'compiler-error';
image

However, some tables with a large number of rows of data have a Cardinality value.

image image

As I am not very familiar with MariaDB, I would like to ask what would cause indexes to not be built for some tables, i.e. Cardinality = 0.

Is it because there are so few rows in the table that the value of Cardinality is not generated, but some SQL statements INNER JOIN or LEFT JOIN some tables with more rows such as judging or judgetask that make the query slow?

nickygerritsen commented 1 year ago

@Dup4 could you somehow give us a dump of your database? Doesn't have to be public if you don't want (on Slack DM or something)

Dup4 commented 1 year ago

@Dup4 could you somehow give us a dump of your database? Doesn't have to be public if you don't want (on Slack DM or something)

This problem was solved when I executed ANALYZE TABLE ${table_name}, does this database still make sense to reproduce?

If so, I can provide the dump of my database.

vmcj commented 10 months ago

@Dup4 could you somehow give us a dump of your database? Doesn't have to be public if you don't want (on Slack DM or something)

This problem was solved when I executed ANALYZE TABLE ${table_name}, does this database still make sense to reproduce?

If so, I can provide the dump of my database.

Yes, if/when we fix this we want to be sure that applying the fix to your database does indeed fix your issue. Also as other people might have that issue and that way we make sure that we actually fix their database with a proper migration.

nickygerritsen commented 5 months ago

Closing due to inactivity. Feel free to reopen if there is more info.