Performance: Periods hook extension

JDGrimes commented 6 years ago

I have received a report of performance issues that appear to be stemming from the Periods hook extension, as triggered by a reaction to the Visit event. The specific issue was that the database was being flooded with requests like this, originating from the get_period_by_reaction() method:

SELECT *, `period`.`id` AS `id`
FROM `{$wpdb->wordpoints_hook_periods}` AS `period`
INNER JOIN `{$wpdb->wordpoints_hook_hits}` AS `hit`
   ON `hit`.`id` = period.`hit_id`
WHERE `period`.`signature` = %s
   AND `hit`.`reaction_mode` = %s
   AND `hit`.`reaction_store` = %s
   AND `hit`.`reaction_context_id` = %s
   AND `hit`.`reaction_id` = %d
   AND `hit`.`action_type` = %s
ORDER BY `hit`.`date` DESC

This request only runs when we are unable to obtain the period ID from the cache.

Based on information from the database server, these requests were all getting stuck in the "Creating sort index" state. This would seem to indicate that the sorting process is what is slowing down the queries.

I have always known that this code was sub-optimal under usual conditions in regard to the Visit event, but because of the design of the new Hooks API it is non-trivial to specialize the code for this. The extension is supposed to be oblivious to exact use-cases, and can handle a much wider range of functionality than is currently being used in relation to the Visit event. However, I think that with a little bit of research we should be able to modify the logic so that it behaves more optimally under this condition.

There are three different approaches to optimization that we need to consider:

Optimizing the query itself.
Optimizing the logic that determines when the query is called.
Adding a shim outside of the hooks API router that optimizes the handling of the Visit event.

JDGrimes commented 6 years ago

Here is the EXPLAIN of the query:

1   SIMPLE  period  ALL period_signature                105 Using where; Using temporary; Using filesort
1   SIMPLE  hit eq_ref  PRIMARY PRIMARY 8   wordpress_develop.period.hit_id 1   Using where

We could probably split the query into two parts, and that might improve things.

JDGrimes commented 6 years ago

Another thing that I noticed about this query is that we are pulling out all of the fields from both of the tables, even though we only actually use hit date and period ID. The rest of the information we already have anyway, except for the hit ID, so there is no reason to pull it out. Actually, it would be better not to store this info in the cache at all since it isn't needed, but I'm not going to change that until a breaking release.

JDGrimes commented 6 years ago

The same actually goes for the query that gets the period by ID, but that is for #763.

What I have done now is also improve the caching so that when the query does not find any matching period, it still saves false to the cache. Then when we check the cache, we can see if a value was found, and even if it is false we don't need to re-run the query, because we already know that no period exists.

This situation can occur when another extension is blocking the reaction from hitting in this context, for example. After the reaction has hit once in that context, this logic would no longer be relevant. However, that may not ever happen under certain situations, like when a user is logged out (though I need to research that further).

JDGrimes commented 6 years ago

As far as guests, perhaps the reactor should check during validation whether the passed value is actually usable. It checks that it is actually an entity, but it does not check that it is of the correct type or that it's ID is actually set. This is something that should be checked per-reactor, of course, since different reactors can have different requirements. But is still stands that it ought to be checked, as otherwise we are running the extension checks unnecessarily, and even recording a hit when none has actually occurred.

This ties in with the idea of reactors aborting hits, see #640. There, we seem to have failed to consider the fact that reactors are called into play during validation. The problem is, how can a reactor know when it is supposed to validate the arg value, and when it is just supposed to validate the settings.

JDGrimes commented 6 years ago

The points reactor now ensures that the target actually provides an ID so that it can award points. This validation method is then called during firing just after the reaction settings are validated, before the extensions run. This will prevent this query from ever running in situations where it doesn't need to, which is probably the majority of the performance bottleneck.

JDGrimes commented 6 years ago

It is not really possible to actually split the query effectively without the likelihood of introducing new performance pitfalls, because we can't sanely limit the results of the first query.

However, I do wonder about whether we need to be so tightly linked to the hit date, or whether we could just go by the IDs. This would be making an assumption that the highest ID is also the most recent date, but it would improve query performance by allowing us to guess the latest hit just by querying the periods table, since it includes the hit ID. This would be a big assumption though, and it wouldn't necessarily be safe. We'd have to really do our homework before we could make it going forward.

The main thing that I can think of that might invalidate this assumption would be imports. Imported hits might have older dates but newer IDs.

I suppose that another alternative would be to save the hit date to the periods table, but that would still be making the assumption that hit dates are immutable. That seems like a safer assumption though.

JDGrimes commented 6 years ago

Because the hit log is a log, the only time that a hit date should be modified is when it was inserted incorrectly. However, at that point you have bigger problems on your hands. I don't think that hit dates should generally be considered mutable.

JDGrimes commented 6 years ago

Even having the hit dates will not really help us, because we still need to only consider hits with the correct signatures. But maybe we should store the whole hit signature as a calculated hash in the periods table. That would prevent us having to store all of the data twice (see https://github.com/WordPoints/hooks-api/issues/84), but still allow us to do a good single-table query with an index on the appropriate columns. We can do that, because we don't need this information individually (and if we did we could always do a join or grab the hid ID, as the case required).

We would combine these five columns:

reaction_mode
reaction_store
reaction_context_id
reaction_id
action_type

And we'd also have to store a copy of the date in the periods table as well.

I think that would be too big of a DB change to try and put into a point release, but we could split it out into another ticket to look at later.

JDGrimes commented 6 years ago

I think we've made a significant improvement here, and I don't think that there is much more that we can do in a point release. Let's see what this does, and then pursue it further if needed.

WordPoints / wordpoints

Performance: Periods hook extension #762