Closed danielb987 closed 2 days ago
Thank you for your investigation. I'll take a look at it.
The solution is probably to add one or more indexes to one or more tables.
That is true. However, if you check the table definitions, almost all (relevant) columns are already indexed.
I compared all of the mentioned queries with the table definitions. Literally all columns, that are involved in any JOIN
and any WHERE
clause is actually indexed. So there is nothing we can do at this "front".
In your measurement protocol, three queries stand out as very slow. The outcome is from your forum and a measurement in another forum will result in different numbers but I expect the ratio of the numbers to be at least similar.
COUNT()
(two affected queries)First one is the count of spam entries in the selection of accessible categories (788ms). Second one is the corresponding count of non-spam entries with the further contraint of being in the list of accessible categories (966ms). Even the first count query for threads in the category selection of the current user –which I do not want to take into account here– takes 222ms in your measurement.
All my investigations came to one conclusion: With table type InnoDB, count queries takes the time because of checks for transaction safety even there is no other transaction at the time of execution the count query. Table type MyISAM would perform this task much faster but would came at the cost of missing features elsewhere.
All sources recommend to waive the count queries during the normal operations and to use a dedicated table that got filled with the necessary data by INSERT
and DELETE
triggers on the entries table. We would need additionally UPDATE
triggers for the case of spam or ham classification (because the count of remarkable entries/threads could change). The consequence would be to count the number of threads and entries for the different user ranks every time, a thread or entry got created, (re)-classified or deleted but not every time, a main view or thread gets requested.
Additionally there are reports about OR
being time consuming. This is in example the case in the 788ms query, where the query limits the data to count having a corresponding value 1
in one table or value 1
in another one (WHERE (mlf2_akismet_rating.spam = 1 OR mlf2_b8_rating.spam = 1)
).
A complex task here would be the different user ranks (unregistered visitors, registered users (possibly with different selections of displayed categories), moderators, administrators and (possibly in the future) user groups with access to different category selections) with their different views. This are to many differing preconditions for simple rules if it should work exact. It would be possible to simplify it with counting the items for unregistered visitors, registered users as well as moderators and administrators without taking the users own category selections into account.
Triggers was introduced in MySQL with version 5.0, but with a limited feature set. I never ever worked with triggers in MySQL until now (I only worked with them in MS SQL). So I have absolutely no experience with syntax and performance of MySQL-triggers at the moment.
This one is with 3356ms (in your measurement) and with this nearly half of the execution time for database queries obviously the elephant in the room. Even in this query all involved columns are indexed. I would expect this to be (at least in part) a result of the four LEFT JOIN
s. I have currently no idea for optimisations in this case. 😟
@auge8472 Thank you for your investigation! It seems that there is not much to do about it.
It seems that there is not much to do about it.
We'll see. In other software products one does not see such blatant performance issues so we have to search for better solutions than we have now.
I checked the longest lasting query in phpMyAdmin (not the most convinient tool for this purpose) with EXPLAIN
putting EXPLAIN FORMAT=JSON
before the keyword SELECT
and STRAIGHT_JOIN
behind it. I have only my testing and development forums, the largest with only 87 entries, so the performance in itself is satisfying.
EXPLAIN FORMAT=JSON SELECT STRAIGHT_JOIN id, pid, tid …
I had to change the category selection and to adapt the table names to mine but apart from these modifications the query remained the same. Even with a satisfying performance I found a serious issue.
With only three category IDs (two exists in my forum, the third is not existing here) the explain was inconspicuous. The query selected 17 out of the 87 rows from mlf2_entries
, joined 17 rows from mlf_userdata
, mlf2_read_entries
, mlf2_b8_rating
and mlf2_akismet_rating
. Per scan (selected main row) only one row of the joined tables got "examined". That's what I would expect. The indexes worked, the query found the matching entries from the other tables with the existing indexes.
{
"table": {
"table_name": "mlf24_akismet_rating",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"eid"
],
"key_length": "4",
"ref": [
"db53644.ft.id"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "10.00",
"cost_info": {
"read_cost": "17.00",
"eval_cost": "0.17",
"prefix_cost": "69.80",
"data_read_per_join": "13"
},
"used_columns": [
"eid",
"spam"
],
"attached_condition": "(`db53644`.`mlf24_akismet_rating`.`spam` = 0)"
}
}
Changing the category selection to all categories of my forum by adding all IDs to the IN()
group changed the situation completely. Now the query selected 86 out of 87 rows from mlf2_entries
(there is one row without a category), joined 86 rows from mlf_userdata
, mlf2_read_entries
, mlf2_b8_rating
with only one examined row per scan. But now the query was not able to use the index e_id
in the table mlf2_akismet_rating
. Because of that the query performed a long lasting full table scan instead.
{
"table": {
"table_name": "mlf24_akismet_rating",
"access_type": "ALL",
"possible_keys": [
"PRIMARY"
],
"rows_examined_per_scan": 86,
"rows_produced_per_join": 85,
"filtered": "1.16",
"using_join_buffer": "hash join",
"cost_info": {
"read_cost": "16.20",
"eval_cost": "8.60",
"prefix_cost": "293.06",
"data_read_per_join": "687"
},
"used_columns": [
"eid",
"spam"
],
"attached_condition": "((`db_name`.`mlf24_akismet_rating`.`eid` = `db_name`.`ft`.`id`) and (`db_name`.`mlf24_akismet_rating`.`spam` = 0))"
}
}
For comparision the corresponding block for the join with table mlf24_b8_rating
in the query with the 86 resulting lines.
{
"table": {
"table_name": "mlf24_b8_rating",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY",
"B8_spam"
],
"key": "PRIMARY",
"used_key_parts": [
"eid"
],
"key_length": "4",
"ref": [
"db53644.ft.id"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 85,
"filtered": "100.00",
"cost_info": {
"read_cost": "1.00",
"eval_cost": "8.60",
"prefix_cost": "302.66",
"data_read_per_join": "687"
},
"used_columns": [
"eid",
"spam"
],
"attached_condition": "(`db53644`.`mlf24_b8_rating`.`spam` = 0)"
}
}
The big differences are "rows_examined_per_scan": 1
in the table mlf24_b8_rating
versus "rows_examined_per_scan": 86
in the table mlf24_akismet_rating
and (presumably a result of this difference) the read cost of 1.00
for the table mlf24_b8_rating
versus 16.20
for the table mlf24_akismet_rating
. Additionally there is a difference between the first run with WHERE categories IN(1, 2, 3)
(category 2 does not exist) versus the second run with all categories in the IN
-clause. In the first case we can see "attached_condition": "(db_name.mlf24_akismet_rating.spam = 0)"
, in the second and slow case we see "attached_condition": "((db_name.mlf24_akismet_rating.eid = db_name.ft.id) and (db_name.mlf24_akismet_rating.spam = 0))"
(removed the backticks around the table and column names for readability).
Further observations can be read in the project forum.
Can anyone confirm what I have observed? Or does someone sees a completely different outcome?
Hi,
no, I cannot confirm this behavior. Did you check the keys of both tables? If I interchange the JOIN statements b8 <--> akismet, the result changes.
What is happend, if we restrict the JOIN statements instead of using WHERE, i.e.,
SELECT id, pid, tid, name, user_name, ft.user_id, UNIX_TIMESTAMP(ft.time) AS time, UNIX_TIMESTAMP(ft.time + INTERVAL 0 MINUTE) AS timestamp,
UNIX_TIMESTAMP(last_reply) AS last_reply, subject, category, rst.user_id AS req_user FROM mlf2_entries AS ft
LEFT JOIN mlf2_userdata ON mlf2_userdata.user_id = ft.user_id
LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = ft.id AND rst.user_id = 0
LEFT JOIN mlf2_b8_rating ON mlf2_b8_rating.`eid` = `ft`.`id` AND mlf2_b8_rating.spam = 0
LEFT JOIN mlf2_akismet_rating ON mlf2_akismet_rating.`eid` = `ft`.`id` AND mlf2_akismet_rating.spam = 0
WHERE category IN
(0, 4, 5, 6, 7, 8, 9, 25, 14, 23, 19, 22, 24, 10, 21, 11, 12, 13, 15, 16, 26, 18, 20, 27) ORDER BY ft.time
Is there a noticeable effect?
/MIcha
Long time ago ...
no, I cannot confirm this behavior. Did you check the keys of both tables? If I interchange the JOIN statements b8 <--> akismet, the result changes.
Due to the tests of the reorganised upgrade script, we became aware of missing indexes for the table mlf2_akismet_rating
in case of an upgrade. The indexes exists in a forum, that was installed with the current stable release. I expect the missing indexes to be the cause of this observation I made.
The reorganisation of the upgrade process[^1] and the corrections to it[^2], which were made possible to a large extent by @joeiacoponi1's tests[^3] - I might not have noticed some things - and @joeiacoponi1's work on improving SQL performance[^4] should at least partially eliminate the cause of this report.
The corretions have not yet been added. I am therefore leaving the issue open, but I would like to state that we are close to a solution.
[^1]: in #680 [^2]: in #707 [^3]: in the project forum and in #708 and #709 [^4]: in #713
@danielb987 Even you have decided to move to phpBB with your forum after we wasn't able to solve the issue within a reasonable time, I would like to ask you to upload the new release to the archived MLF instance. It contains noticeable performance improvements. I am curious whether the changes will solve your performance problem of if it is related to another section of the several queries.
There was several changes regarding the way to join the spam-prevention related tables. All of the queries you have listed in your report are joining these tables. So it looks as there could be improvements. And should it only be for those who want to look into the archive, it cound be worth the work.
I'm closing here because I no longer expect a response.
Thank you for your work on MyLittleForum!
We have a MyLittleForum with version 20220803.1 at https://www.jvmv2.se/forum/index.php
The forum is very slow so I did a couple of things to identify why:
I added this code in the beginning of
index.php
:I added this code at the end of
index.php
:I added this code to the end of
config/db_settings.php
:And I did a search and replace of all occurrences of
@mysqli_query
todaniel_mysqli_query
.The result is this. The time is shown in the number of milliseconds and only queries that take 10 seconds or more are shown.
You can with the result on this forum: https://www.jvmv2.se/daniel/forum/ The result is on the bottom of the page.
The conclusion is that several queries take 100 milliseconds or more. And that one query takes several seconds. The solution is probably to add one or more indexes to one or more tables.