Closed coyotemarin closed 13 years ago
After EXPLAIN
ing some queries, it looks like the cleanest way to do this is:
SELECT COUNT(*)
FROM `...`
WHERE (`lock_until` IS NULL
AND `last_updated` <= UNIX_TIMESTAMP() - ...) OR
(`lock_until` IS NOT NULL
AND `last_updated` <= UNIX_TIMESTAMP() - ...)
which looks pretty silly. But works, and is probably more portable than index hints.
After thinking about various things that can go wrong with a loop table, and trying to design monitoring tools that can catch them, I think we should probably worry less about which queries are more efficient, and more about warning users when IDs have gone a long time since being updated. Under the current system, we could get()
and lock a bad ID (one that causes a crash before we can call did()
) over and over again, and we wouldn't notice, because we wouldn't included locked/bumped rows in update stats.
In practice, only a small fraction of rows should be locked, so it should be fine to use an index scan on most of the table, and then scan over all locked rows, using a "silly" query like the one above.
Been messing with various permutations of these queries on a table with about 30 million rows, on a wimpy database server (my 4-year-old MacBook Pro).
It looks like there's no cheap way to do a COUNT(*)
; it kind of looks like I can get some speedup by using the index in some cases, but actually MySQL uses (lock_until, last_updated)
as a covering index no matter what I do.
So I really might as well roll these all into one monstrous covering-index-scanning query.
Checked in a new stats()
function after trying a few variations. It turns out that SUM(IF(...))
is kind of expensive, so it's better to keep the min/max ID queries and locked/bumped queries separate.
We currently don't include locked rows in the
delayed
stats returned bydoloop.stats()
because we're trying to avoid doing anything more than an index scan. However, in most cases, locked rows are going to be a small minority of all the rows in the doloop table, so it seems like we could do an index scan for the unlocked rows, and then walk through all the locked rows.