fossar / selfoss

multipurpose rss reader, live stream, mashup, aggregation web application
https://selfoss.aditu.de
GNU General Public License v3.0
2.35k stars 344 forks source link

Slow MySQL queries when reading new items #1282

Open albancrommer opened 2 years ago

albancrommer commented 2 years ago

My SELFOSS database has ~500 sources and ~330k items, the oldest from Jan 2016.

Love Selfoss thanks a lot for it !

Database Version : mariadb 10.3.27
PHP Version : PHP7.0  (yes, I know)
OS : Debian 11 Bullseye

My problem: SELFOSS keeps slowing down when loading items.

I have activated the MySQL Log, and the query executed on each items load is responsible.

Here is the main query :

SELECT
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tagsitems.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags
FROM items AS items, sources AS sources
WHERE items.source=sources.id AND TRUE AND unread=1 
ORDER BY items.datetime DESC, items.id DESC 
LIMIT 50 OFFSET 0;
- - -
50 rows in set (6.440 sec)

Pretty slow.

Running the EXPLAIN plan for that query tells us more:

EXPLAIN SELECT 
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tagsitems.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags
FROM items AS items, sources AS sources
WHERE items.source=sources.id AND TRUE AND unread=1 
ORDER BY items.datetime DESC, items.id DESC 
LIMIT 50 OFFSET 0;
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+---------------------------------+
| id   | select_type | table   | type | possible_keys | key    | key_len | ref                      | rows | Extra                           |
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+---------------------------------+
|    1 | SIMPLE      | sources | ALL  | PRIMARY       | NULL   | NULL    | NULL                     |  489 | Using temporary; Using filesort |
|    1 | SIMPLE      | items   | ref  | source        | source | 4       | alban_selfoss.sources.id |  349 | Using where                     |
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+---------------------------------+

The query shows use of temporary table and filesort, which is slow.

Let's see how ORDER BY clauses are responsible for that by removing them:

SELECT 
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags 
FROM items AS items, sources AS sources 
WHERE items.source=sources.id AND TRUE AND unread=1 
LIMIT 50 OFFSET 0;
- - - 
50 rows in set (0.001 sec)

EXPLAIN SELECT 
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags 
FROM items AS items, sources AS sources 
WHERE items.source=sources.id AND TRUE AND unread=1 
LIMIT 50 OFFSET 0;

+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+-------------+
| id   | select_type | table   | type | possible_keys | key    | key_len | ref                      | rows | Extra       |
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+-------------+
|    1 | SIMPLE      | sources | ALL  | PRIMARY       | NULL   | NULL    | NULL                     |  489 |             |
|    1 | SIMPLE      | items   | ref  | source        | source | 4       | alban_selfoss.sources.id |  349 | Using where |
+------+-------------+---------+------+---------------+--------+---------+--------------------------+------+-------------+

Removing ORDER BY results in in 0.01 second query and no more temp table or filesort.

Solutions

It seems this is due to MySQL's handling of queries using ORDER BY cf. documentation.

A possible fix would be to split the query in 2 parts and let PHP associate items with sources.


SELECT
items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author
FROM items AS items
WHERE unread=1 
ORDER BY items.datetime DESC, items.id DESC 
LIMIT 50 OFFSET 0;
50 rows in set (0.001 sec)

SELECT sources.title as sourcetitle, sources.tags as tags
FROM sources AS sources;
489 rows in set (0.002 sec)
jtojnar commented 2 years ago

Optimization of database access is definitely something we need, I just have not gotten around to looking into it so I am thankful for any such analysis.

I wonder if we could come up with an index that would avoid the need to splitting the queries. I would expect something like CREATE INDEX loading_items ON items (source, datetime DESC, id DESC); to help but it’s been a while since I worked with databases. And the EXPLAIN report seems to say it is actually sorting sources, which is confusing to me, as sources should be already sorted according to its primary key (it is a InnoDB table).

albancrommer commented 2 years ago

Ha, same, fixing the problem by not changing the code was my first try ! Simple, better, right?

But adding indexes doesn't work. I've tried. I am not a great DBA in any way so someone might get it to work, but so far, nope.

As far as I understand the fact that there are two tables in the query seems to force the sortfile + tmptable behaviour when using ORDER BY.

That explains why the sources table index is not being used I believe. Two tables, ORDER BY, no index.

See the official documentation https://dev.mysql.com/doc/refman/5.6/en/order-by-optimization.html regarding ORDER BY optimizations for more details about that behaviour.

So

  1. maybe PostgreSQL works better? That would be interesting.
  2. I've been looking at the code to test a fix. To not break anything, I believe the change should be done in DAOS/mysql/Items.php
niol commented 2 years ago

Back when I had worked on this, I felt the need to add this in pgsql. If I remember correctly, this greatly improved fetching and cleanup.

CREATE UNIQUE INDEX uid_idx ON items USING btree (source, uid);
CREATE INDEX datetime_idx ON items USING btree (datetime);
CREATE INDEX updatetime_idx ON items USING btree (updatetime);
albancrommer commented 2 years ago

Back from testing the code patch.

It appears that altering the DAOS/mysql/Items.php made the page load in 1.5s versus 7.5, as expected.

Here is a code patch if you want to test.

--- a/tmp/Items.php
+++ b/daos/mysql/Items.php
@@ -320,7 +320,7 @@ class Items extends Database {

         // get items from database

-        $select = 'SELECT
+/*        $select = 'SELECT
             items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author, sources.title as sourcetitle, sources.tags as tags
             FROM ' . \F3::get('db_prefix') . 'items AS items, ' . \F3::get('db_prefix') . 'sources AS sources
             WHERE items.source=sources.id AND';
@@ -346,6 +346,55 @@ class Items extends Database {
         }

         return \F3::get('db')->exec($query, $params);
+*/ // Original code
+
+        $select = 'SELECT
+            items.id, datetime, items.title AS title, content, unread, starred, source, thumbnail, icon, uid, link, updatetime, author
+            FROM ' . \F3::get('db_prefix') . 'items AS items
+            WHERE TRUE AND ';
+        $order_sql = 'ORDER BY items.datetime ' . $order . ', items.id ' . $order;
+
+        if ($where_ids != '') {
+            // This UNION is required for the extra explicitely requested items
+            // to be included whether or not they would have been excluded by
+            // seek, filter, offset rules.
+            //
+            // SQLite note: the 'entries' SELECT is encapsulated into a
+            // SELECT * FROM (...) to fool the SQLite engine into not
+            // complaining about 'order by clause should come after union not
+            // before'.
+            $query = "SELECT * FROM (
+                        SELECT * FROM ($select $where_sql $order_sql LIMIT " . $options['items'] . ' OFFSET ' . $options['offset'] . ") AS entries
+                      UNION
+                        $select $where_ids
+                      ) AS items
+                      $order_sql";
+        } else {
+            $query = "$select $where_sql $order_sql LIMIT " . $options['items'] . ' OFFSET ' . $options['offset'];
+        }
+
+
+        $items = \F3::get('db')->exec($query, $params);
+
+
+
+        $select_sources = 'SELECT 
+            id, title as sourcetitle, tags as tags
+            FROM sources AS sources;';
+
+        $rawSources = \F3::get('db')->exec($select_sources, $params);
+        $sources = array();
+        // Yuck
+        foreach( $rawSources as $source ){
+          $sources[$source["id"]] = $source;
+        }
+
+        foreach( $items as $i => $item ){
+          $item[$i]['tags'] = $source[$item["source"]]["tags"];
+          $item[$i]['sourcetitle'] = $source[$item["source"]]["sourcetitle"];
+        }
+
+        return $items;
     }

Note 1: My selfoss instance's code is apparently seriously out of date... It is Version 2.18 but there was no release since?

So, careful, that patch is based on the 2.18 version.

I should probably upgrade, as git seems to be right way to follow a not-so-dead project after all ;)

Note 2: The queries are breaking as soon as I try to facet (ex: load a tag) but that was expected.