OllieJones / sqlite-object-cache

A WordPress persistent object cache for the rest of us.
GNU General Public License v2.0
24 stars 4 forks source link

Implement wp_cache_get_multiple to use one request #29

Closed spacedmonkey closed 1 year ago

spacedmonkey commented 1 year ago

Where wp_cache_get_multiple is implemented in the plugin, it still just loops around each item in the array doing a single request.

See this article

https://xwp.co/object-caching-improvement-wordpress-6-2/

There are serious performance benefits from getting multiple rows at once.

OllieJones commented 1 year ago

Thanks for the reminder. I did read your fine article a while ago.

I've been working on doing this. Already the multi operations run within single SQLite transactions. In the SQL world, the best multi performance increase can come from

SELECT name, val FROM table WHERE name BETWEEN 'group|100' AND 'group|102'

because it causes a nice efficient range scan on the clustered index. The alternative,

SELECT name, val FROM table WHERE name IN ('group|100', 'group|101', 'group|102') 

and the like are not significantly faster than just looping the queries, as long as a transaction is active. So, I'll need a "detect consecutive keys" kind of feature in the code. It's taking some work. It doesn't help that some keys are numeric and other are text, meaning the database column has to be text. Which means we get text collations. Which means 2 comes between 199 and 200, which is a well-known pain in the RDBMS xxs.

spacedmonkey commented 1 year ago

@OllieJones A much simpler solution would be change the database schema.

Name current has both id and group in it. But we have another column for group, that would make lookup by group much easier, cache clears by group much easier and multiple gets.

OllieJones commented 1 year ago

I wish that were true, because it would indeed be simpler.

But, because some keys are alpha and others are numeric, the consecutive-keys and range-scan issue still exists. To be able to satisfy get_multiple ( 'group', [199, 200, 202]) without also fetching 1990, 1991, 1992, ..., 2, 20, 21, ..., 200 etc isn't solved by adding another column to the clustered primary key.

It could be solved by adding another table, structured as you suggest, just for the cache items with integer keys. That wouldn't reduce complexity.

In the meantime, deduplicating the keys will help a little, as will shortcutting the empty-key-array and single-item key array cases. Both crop up in my testing often enough to be worth their own code paths.

OllieJones commented 1 year ago

I assume that get_multiple should not fetch all cached items in the named group. I believe that is unreasonably wasteful of WordPress's RAM working set.

Correct?

spacedmonkey commented 1 year ago

@OllieJones Why not have a schema like this.

id INT NOT NULL PRIMARY KEY COLLATE BINARY,
key TEXT,
group TEXT,
site_id INT,
value BLOB,
expires INT

The id, isn't really used for anything. Similar to how the primary key on the meta table in WordPress is not really used. So all the lookups, update and delete would have to be changed to be like this.

$this->sqlite->prepare( "SELECT value FROM $tbl WHERE key = :key AND group =:group and site_id = :site_id AND expires >= $now;" );

This is a big change, but opens a lot of more going forward.

OllieJones commented 1 year ago

There's a big performance reason to avoid a surrogate PK (your id) column. That's the technology called the "clustered primary key". In SQLite tables with no ROWID, and in InnoDB tables, and in SQL Server, the primary key data structure stores the entire table. So pk_column BETWEEN this AND that does a very cheap range-scan on the clustered PK data structure and grabs the results.

spacedmonkey commented 1 year ago

@OllieJones Shouldn't this be as simple as in

$names = array_map( $keys, function( $key ) use ( $group ) {
  return $this->name_from_key_group( $key, $group );
});

$this->sqlite->prepare( "SELECT name, value FROM $tbl WHERE name IN ('names') AND expires >= $now;", implode( "','" $names  ));

Am I missing something?

OllieJones commented 1 year ago

A couple of factors to consider here.

First: Reusing prepared statements in SQLite3 helps performance quite a bit, like in SQL Server or Oracle. This statement-precompilation advantage doesn't exist with MySQL / MariaDB. But WHERE key IN (101, 102, 3, 40005) means an argument-count-dependent prepared statement like WHERE key IN (:one, :two, :three, :four) (for example, obviously we wouldn't spell out the placeholders in software). So we'd need prepared statements for quite a few different argument counts. Gets complex.

Second: The query execution plan for WHERE key BETWEEN 'start' AND 'end' is exceptionally simple. It's a lot simpler than plan for the alternative you propose. In a local environment, a series of those prepared BETWEEN queries will often outrun a single WHERE key IN() query: no network round trips, the same or fewer index range scans. On a client-server system extra round-trip time swamps the series-of-queries so WHERE IN () is better, but not on a local-file-system SQLite system.

I think Till implements get_multiple and other *_multiple with some kind of Lua stored-procedure code in redis, in his commercial plugin. I don't know about memcached.

spacedmonkey commented 1 year ago

Memcached has getMulti that does this all for you.

Looking into SQlite, usin prepare statements + in, seems to be a issue. Seems like this is a couple of options.

  1. Have a number of prepare statements, with placeholder for say 1,2,3,4,5,6,7,8.
  2. Create a temporary table and do a nested query.
  3. Do multiple statements in a single query and then use UNION.

My knowledge of Sqlite is a very limited. Maybe this is just one we can easily fix....

OllieJones commented 1 year ago

Hypothesis: Proven for client-server databases, unproven for SQLite: The per-SQL-query overhead is large enough that it makes sense to write complex queries to reduce the query count.

spacedmonkey commented 1 year ago

I was thinking about multiple gets today. I had two ideas.

  1. Have an prepared statement with 10 placeholders. Loop through ids, and break into groups of 10. For odd numbers, like 23, just query empty strings. The query will ignore them.

  2. If the above is impossible, have an prepared statement with 5 placeholders. and do it chunks of 5, and then do single gets for the rest. So for 17 keys, that could be 5 multi gets and 2 single gets.

As the database is local and there is no network overhead, maybe simple queries are faster. I don't know, hard to tell without benchmarks...