Cache keys go missing from tags over time

nemphys commented 11 years ago

Hi, I have been using this backend in my production server quite successfully for the past 1 month. The only problem I have spotted so far is that after a few days without restarting redis (I do not use persistence, so restart=flush), some cache keys disappear from their tag's list, which leads to them not being deleted upon a cache clean by tags.

So far, I have not managed to reproduce this behavior on purpose, it seems to be quite random. I am thinking of 2 scenarios: 1) The gc script is not working properly (I am running it every 3 hours) 2) While a tag clean is being executed, another request is inserting new keys to the same tag and something goes wrong (no locking?)

I have not yet thoroughly reviewed the code, so I do not have a complete view of how (or if) locking is implemented for writes/flushes, will do so as soon as possible.

In the meanwhile, has this behavior been spotted before?

colinmollenhour commented 11 years ago

Thanks for reporting your investigations.

The _collectGarbage method definitely is not 100% safe in terms of atomicity so there is a possibility that an expired id is getting removed by the garbage collector at the same time that it is getting re-added by another process.. One solution to this would be to (redundantly) delete all keys that are removed from the tag for safety. Also the garbage collection could be converted to a LUA script to make it more transaction-safe, but this would also lock the entire cache while cleaning..

I don't think the tag cleaning can cause this issue since it is done using a transaction (MULTI .. EXEC) so it must be the garbage collector. I personally run it only once a day during off-peak times.

I committed a branch that removes keys while expiring them to ensure they don't get orphaned. It will probably be considerably slower though so please test and let me know if it fixes the issue before I merge it with master and let me know if you have exorbitant memory usage issues (building the strings for Redis commands can consume a lot of memory) or locking or other performance issues.

If that fixes your problem I can look into improving it by using a LUA script to push all expired ids to a set and then removing those ids via a LUA script. That would probably come in handy in other places as well actually..

nemphys commented 11 years ago

Hi, thanx for spending the time to look into this. I think the problem does not exist only in the garbage collector, though. Even though I switched to a single gc execution per day, the issue just happened again at a time irrelevant with gc. I just took a quick look at the code and I think the real problem is within the save method; even though the redis commands are executed with pipeline, the cache tags for the key are fetched quite early in the method's code and some php code/checks are executed before the batch command actually goes through. Imagine this scenario: (1) Cache save is triggered -> (2) cache tags for the key are fetched from redis -> (3) key is saved -> (4) cache tag differences are calculated through php -> (5) commands are sent as batch to redis. If a cache clean for a tag relevant to this key is executed anywhere between (2) and (5), the cache key will be deleted from the relevant tag's members list and will not be added again, since the (already fetched) tags array will indicate that it is already there. I hope you see my point and I am not missing something, but this is quite critical if there can be no kind of lock while doing all this. Another method would be to save the tag-relevant entries regardless of their previous state; this would cause more "writes" to redis (many of them unneeded), but would also save us a "read" (the initial tag read).

Since I just discovered redis a month ago, I still haven't had time to looks at its internals and I am therefore not in a position to propose something more solid.

Please share your thoughts

nemphys commented 11 years ago

Hmm... I just found this: "All the commands in a transaction are serialized and executed sequentially. It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the commands are executed as a single isolated operation.", which means that the above scenario can not happen. Any ideas what else could go wrong?

nemphys commented 11 years ago

OK, I think I got it: The rule above means that nothing will be executed during the commands execution, but a lot can happen while the commands are queued! The correct way to go is using WATCHES. I am almost sure that WATCHes for all tag keys (both the parameter/new and the key's saved tags if they exist) must be executed right before they are fetched and the transaction command outcome must be monitored in the end of save method's code, so that if something was changed during our calculations/queueing, we should start from scratch. This is how redis wants it to be done, please check http://redis.io/topics/transactions (WATCH part) and http://stackoverflow.com/questions/10750626/transactions-and-watch-statement-in-redis.

I just hope that WATCH is implemented by phpredis now... :-) EDIT: it does

colinmollenhour commented 11 years ago

The whole purpose of fetching the $oldTags is simply to reduce redundant SADD commands. The only way I can imagine the tag data becoming inconsistent is if the save occurred at the same time as a clean which means these skipped SADD commands could cause the record to not be tagged. This seems highly unlikely, but I think you could be right. Before implementing WATCH which has potential to make things really messy please just try replacing lines 206:207 with $oldTags = array() and see if that fixes it. If it solves the problem then perhaps WATCH could be implemented to resolve the issue.

Thanks for your continued investigation and contributions!

nemphys commented 11 years ago

I think that is exactly the case I am talking about and I don't see why this is highly unlikely. On the contrary, it seams to be quite likely to me. What you propose (ignoring the saved key tags) should remedy the problem, as the key would be saved in all tags from scratch (I think this is what I proposed at first), but is it less than optimal. The problem is that the issue is almost impossible to reproduce on demand and I just stumble over it accidentally. I already made an experimental WATCH implementation; I think I will check it on my development server, give it a shot on my production server if all seems well and finally send a pull request if there are no apparent issues. Furthermore, since I haven't yet had time to check the cleaning code, do you think something equivalent should be implemented there or is there no need to?

colinmollenhour commented 11 years ago

The clean operation gets the matching ids and then does the rest in a transaction so I don't see any harm in not using WATCH there. Using WATCH in both save() and clean() would probably create near-deadlock like race conditions. The concern I have with using WATCH is somehow getting into a situation where the commands have to be repeated over and over and over.. So if WATCH is used I think it should only be used once and then if the commands must be repeated just skip the use of $oldTags and save over everything without using WATCH. There is also the possibility of using a lua script to do the save and reduce the round-trips but currently the list of tags can be compressed so that would cause an issue.

nemphys commented 11 years ago

I don't see why using it in both could cause a near-deadlock as you say. I suppose the maximum number a save or clean call can be executed with watches is equal to the simultaneous request for such operations (clean/save). Maybe we could use a (hardcoded or user settable) cap on the max retries, after which the save can fail. What do you think?

nemphys commented 11 years ago

Hmm... there is also the case when multiple save attempts are issued for different cache keys with the same cache tags, though. This could stress things quite a lot I guess. The other method I am thinking would be to implement WATCHes only for cleans and not for saves. That could be more reliable and I think it would also solve the problem...

colinmollenhour commented 11 years ago

Still not keen on using WATCH.. I just pushed a solution (not to master) that just always does the SADDs rather than trying to skip redundant SADDs. Let me know what you think.

nemphys commented 11 years ago

OK, will try both ways and report back. I don't see why not using WATCHes though, since it's the recommended way to do it. We shall see... Thank you for your time and effort.

nemphys commented 11 years ago

Finally (irrelevant to this issue, sorry), the tags parameter should be processed with array_unique before any calculations, in order to avoid duplicate redis queries...

nemphys commented 11 years ago

OK, tried the proposed way for 2 weeks now on production and I think it's working ok; maybe you should consider merging it.

colinmollenhour commented 11 years ago

Thanks @nemphys! I just pushed and tagged as 1.3.

tecusan commented 11 years ago

@nemphys was the problem with magento and the config tags ? I had the same behavior, I also blamed the gc, but now i found a problem in Mage_Core_Model_Config->saveCache

colinmollenhour commented 11 years ago

@tecusan, care to elaborate on the problem you found?

tecusan commented 11 years ago

Magento is using a lock mechanism to generate the config. But the lock write isn't set right after lock ->read. (it is set after some time consuming operations .. xml processing.. ) In our case (a config with about ~ 100 shops), on production the config was always created by one process and right after deleted by an other one (sometimes this took hours).. because the lock wasn't set. A side effect was also that the tag list didn't contained all config tags after that.

Mage_Core_Model_Config

/**
 * Save configuration cache
 *
 * @param   array $tags cache tags
 * @return  Mage_Core_Model_Config
 */
public function saveCache($tags=array())
{
    if (!Mage::app()->useCache('config')) {
        return $this;
    }
    if (!in_array(self::CACHE_TAG, $tags)) {
        $tags[] = self::CACHE_TAG;
    }
    $cacheLockId = $this->_getCacheLockId();
    if ($this->_loadCache($cacheLockId)) {
        return $this;
    }

    // FIX - the lock should be right after read, because the generate config operations are time consuming
     $this->_saveCache(time(), $cacheLockId, array(), 60); 

    if (!empty($this->_cacheSections)) {
        $xml = clone $this->_xml;
        foreach ($this->_cacheSections as $sectionName => $level) {
            $this->_saveSectionCache($this->getCacheId(), $sectionName, $xml, $level, $tags);
            unset($xml->$sectionName);
        }
        $this->_cachePartsForSave[$this->getCacheId()] = $xml->asNiceXml('', false);
    } else {
        return parent::saveCache($tags);
    }

    // $this->_saveCache(time(), $cacheLockId, array(), 60); - moved up 
    $this->removeCache();
    foreach ($this->_cachePartsForSave as $cacheId => $cacheData) {
        $this->_saveCache($cacheData, $cacheId, $tags, $this->getCacheLifetime());
    }
    unset($this->_cachePartsForSave);
    $this->_removeCache($cacheLockId);
    return $this;
}

colinmollenhour commented 11 years ago

I strongly recommend to never clear config cache on production, esp with 100 stores like you have. Instead, use this cache updating script I wrote: https://gist.github.com/colinmollenhour/2715268

It will generate a fresh cache and save it over the old one so that there is no stampeding on the cache save.

tecusan commented 11 years ago

Yes, this is true! But, the cache is not always deleted by magento.. it may be evicted You may consider also to change this code:

// Always expire so the volatile-* eviction policies may be safely used, otherwise // there is a risk that tag data could be evicted. $this->_redis->expire(self::PREFIX_KEY.$id, $lifetime ? $lifetime : $this->_lifetimelimit);

It's not the best solution to force an expiration date for cache.

Mage_Core_Model_Config

protected function _loadSectionCache($sectionName)
{
    $cacheId = $this->getCacheId() . '_' . $sectionName;
    $xmlString = $this->_loadCache($cacheId);

    /**
     * If we can't load section cache (problems with cache storage)
     */
    if (!$xmlString) {
        $this->_useCache = false;
        $this->reinit($this->_options);
        return false;
    } else {
        $xml = simplexml_load_string($xmlString, $this->_elementClass);
        return $xml;
    }
}

If a config key is evicted by Redis, the entire config is recreated and then the "party" starts :).

colinmollenhour commented 11 years ago

You have a point.. I think forcing expiration is the safest in terms of not corrupting the cache due to evictions but if a user understands the risks then it should be allowed for reasons like this. Still, the max lifetime is 30 days so this should only be an issue if you go for 30 days without updating your cache which is pretty rare.

Would you like to submit a pull request for adding an <allow_nonvolatile> option? I think it should be disabled by default since the repercussions of having lots of evictions with this enabled are severe. Currently the cache works pretty well even when there are frequent evictions and I don't want to change that by default.

For example I can imagine for Magento EE users it would be best to enable allow_nonvolatile for the core cache and disable it for the page cache (although maybe the FPC never uses infinite lifetimes to begin with..)

tecusan commented 11 years ago

Exactly, if you understand the risks, you should be allowed to add nonvolatile keys!

"Still, the max lifetime is 30 days so this should only be an issue if you go for 30 days without updating your cache which is pretty rare." I'm not sure if you have right with that.

volatile-lru remove a key among the ones with an expire set, trying to remove keys not recently used.
volatile-ttl remove a key among the ones with an expire set, trying to remove keys with short remaining time to live.
volatile-random remove a random key among the ones with an expire set.
allkeys-lru like volatile-lru, but will remove every kind of key, both normal keys or keys with an expire set.
allkeys-random like volatile-random, but will remove every kind of keys, both normal keys and keys with an expire set.

So the key may be evicted even bevore 30 days. But if I don't set a expiration date, i can use the volatile options and be sure that some keys (configs..) are not evicted.

tecusan commented 11 years ago

"Would you like to submit a pull request for adding an option?" Yes, I'll do it.

I have also an other observation. Currently, the lifetime can be greater than the lifetimeLimt. The lifeTimeLimit is only used if you don't specify a limit. I'm not sure if this is a bug or a feature.

My fix for the config caches will be something like this: An option with config,key_a,.... And then I'll add the config tag in the local.xml You could also consider to add this feature. (but this is more a workaround )

colinmollenhour commented 11 years ago

@tecusan I have changed the backend so that it respects the parameters passed to it for non-volatile cache lifetimes. If anyone has an app that is using lots of keys with no lifetimes and they get evictions they could be in trouble, but I think you are right that in some cases the need for a truly non-volatile cache record is worth the trade-off.

nemphys commented 10 years ago

Hi, sorry to bring this up again, but I think that the patch (as a result of this issue report) does not totally solve the problem, it just makes it more rare. Today I was deleting the cache for a specific product (tag like catalog_product_xxx), but a specific cache key was not being deleted (I am doing heavy custom caching in my Magento installation so there is no point trying to explain which cache key, etc). Since this caused an issue on the frontend and I wanted it fixed quickly, I didm't have time to investigate properly and just clear another (more generic) cache tag that should have that specific key included and then it was actually deleted. The thing is that this proves that there are still cases when saving and deleting from the cache at the same time can cause keys/tags inconsistencies. Since both saving and deleting involves many steps in php code and there is actually no locking during these steps, it should be quite possible for such things to happen (and I mean at a non-negligible rate). Do you think there is any chance reconsidering the WATCH-route proposed in the past or maybe any other locking implementation that might exist (though I think there is no real locking mechanism in Redis except for WATCH/retry). I love Redis as a Magento cache backend, but we must try to make it work 100% correctly (especially now that I think it has been included in CE by the Magento team). There are also a few other issues I have stumbled upon during the past few months, but I will open separate issues for them.

colinmollenhour commented 10 years ago

Thanks for the update.

I don't have a very positive outlook with respect to the WATCH/retry method and its reliability and mainly it's performance. With a high number of keys/tag or tags/key this could go downhill really fast...

The way I see it is the cache in most cases is not a 100% critical system in that if a record doesn't invalidate it's not causing real data loss or inconsistency, but rather an inconvenience. I'm sure there are edge cases in a vanilla Magento install where one could make a case it is 100% critical, but for the most part everyone is pretty content with (or more likely not even noticing) occasional errors. If in your situation the cache is more critical and you can't accept any inconsistencies then I'd strongly advise against using Redis and would instead look at MongoDb as a cache backend. MongoDb doesn't support transactions either, but it's data structures lend themselves perfectly to caching with tags as only one collection is used so a save is a single insert/update and a cache tag clear is a single delete command. To me the only negative with MongoDb is the inability to limit it's memory usage. It's advantages otherwise are numerous and the larger the scale the more it will outperform Redis.

nemphys commented 10 years ago

OK, you're right, WATCHing would potentially create a retry mess. I insist the the problem should be taken care of, though, as I do like things working 100% :-) I suppose I am not the only one out there doing serious custom caching in Magento (really slow by it's nature), so the "Come on, it's just cache" approach can cause serious issues to a lot of people. For example, I am caching product pages (as many people do, I believe) and if the correct cache entries are not cleared if eg. product stock changes, then this can definitely be a big problem.

Anyway, to return to the point, I have just implemented _removeByMatchingAnyTags using a lua script, which performs much-much faster than the normal fetch/del approach and is also atomic by definition. It will also (hopefully) solve one other very serious issue I was facing lately when trying to clear too many keys (eg. Block_Cache) and I would always get an error; either out of memory (even after raising php memory limit to 2GB!) or timeout by redis, or something similar. With the new implementation, all work is done server side and should be super fast no matter what, avoiding potentially thousands of roundtrips.

I will test in in production for a few days and if all is well I will make a PR (I still left your version as default, the script is executed only if a new use_lua local.xml setting is set to 1).

PS. the change required a few minor changes in CRedis also, since the eval/evalSha implementation was probably too old there compared to current PHPRedis code.

colinmollenhour commented 10 years ago

Ahh, I forgot about lua. I remember I actually started implementing some of the commands as lua scripts myself once but I must not have gotten very far as I only vaguely remember it.. I agree that it should be able to solve race conditions, but as some work is being loaded onto Redis which is already pretty easily CPU-bound I am skeptical as to the performance. Only one way to find out! I'm very interested to see your results and implementation and think that making it an optional feature is definitely the way to go. Cheers!

nemphys commented 10 years ago

Right, lua is the way to go, so far it rocks. I even managed to clear Block_Html cache successfully after a very long time! Performance is great, I measured a cache clean from start to end for the same cache entries and (even though the specific test affected very few cache key entries) the speed increase was something like x10(!); I imagine it will be much more effective for larger key sets. I even optimized it further, so that evalSha (server script cache) is utilized.

I also found an issue in my code that caused wrong cache tags in on particular cache type, so that might have also been the cause of the issue I mentioned earlier :-)

Anyhow, I will soon make the PR along with another few PRs for some other minor features I have implemented.

PS. I will also try to change the GC code to (optionally) use lua when I find the time.

colinmollenhour commented 10 years ago

Cool. When I said I was skeptical about performance it is just the fact that during a cache clean all other requests will be blocked completely, so a benchmark that uses concurrent processes must be considered, not just single-process. E.g. if clearing block cache takes 1 second and you have 20 req/sec then that is 20 processes that will be affected by the cache cleaning. Looking forward to the PR. :)

nemphys commented 10 years ago

Yes, I know, but I suppose blocking all other processes for even 5-10 seconds (that's how long it took to clean Block_Html in my case, but it was arounf 5GB of data :-) ) is better than failing. Seriously, with the default method it would fail every time and the only option was to flush the whole db. Will hopefully send the PR tonight.

nemphys commented 10 years ago

Sorry for the delay, PR #50 is up, please check. Also note that PR 12 in CRedis (https://github.com/colinmollenhour/credis/pull/12) must be merged in order for this one to work properly.

colinmollenhour commented 10 years ago

An idea for further improvement (that is only an idea, not thoroughly thought through): In order to not block the entire cache for long periods of time the lua script could clear up to X number of keys before yielding. The LUA method could return the number of keys that have yet to be cleaned and the client would keep calling it until this number was below X. E.g. if there were 12500 keys and X was 5000 then the LUA script would be called three times. There is a potential for more errors with this method but it would allow the cache to not be blocked for one really long period of time.

nemphys commented 10 years ago

Could be easily done, but it would eventually end up with the same "vulnerabilities" as the current implementation. I don't see why you're so afraid of locking, Redis should be pretty fast handling the script, even for large key sets. For too large key sets (like in my case), I think the longer lock (a few more seconds) is also acceptable.

colinmollenhour commented 10 years ago

I will give your current implementation a test. Thanks for the PR!

nemphys commented 10 years ago

No problem, btw. I have it running on my live site for 3-4 days now and all looks smooth (even with Redis taking up 22GB of RAM, so imagine how much data is being cached). Thanx

nemphys commented 10 years ago

Hmm, the issue reappeared today. It definitely has nothing to do with the lua script (which is REALLY fast and effective by the way); I tried clearing the cache for a specific tag and I could see that not all keys were being cleared. This means that somehow the specific keys went missing from the specific tag's key list at some point. Since the save method uses a transaction (and should therefore be atomic), I cannot imagine a reason for something to go wrong there, even if a cache clean happens in the meanwhile (after the patch this issue's conversation produced). All I can think of is the garbage collector; there is a lot of space there for key/tag inconsistency, as it takes a (relevantly) long time between key list reads/checks/removals. I will temporarily stop running the gc script for a few days and try to change it to a lua script in the meanwhile.

One other thing I noticed that is incomplete in the cache cleaning logic is the (usual) case when a cache key has many cache tags; if one of the cache tags gets cleaned, all cache keys that have this tag are deleted, but the keys are not removed from the other tag key lists they belong to. Of course this should be taken care of by the gc script but I am starting to reconsider the logic in there. Of course, since my lua script only does what the initial implementation did, it does not take this matter into consideration. This whole thing should probably not cause any data inconsistencies (except for some data overhead), but I am pulling my brains out trying to figure out what is causing the inconsistencies in my case.

Will get back as soon as I have any results.

colinmollenhour commented 10 years ago

GC is definitely a probable culprit. I just pushed a commit I started working on months ago to fix the memory usage issue of the garbage collect method: 06d78dbf

Note, in case you haven't seen it, I recently pushed a new backend: Cm_Cache_Backend_Mongo. In my basic tests it has significantly better read and write performance and almost as fast clean performance. Also, it has absolutely no vulnerabilities for race conditions or data corruption.

nemphys commented 10 years ago

Hmm... Just finished implementing GC via Lua and now I see your reply :-) Anyway, I did it in a total different way, much more like my other script (all server side), in order to preserve atomicity. My only concern is speed, I will test it on my live site late tonight and see how much time it will take.

Will check into MongoDB, but I have spent too much time on Redis lately and I want it to pay off before I fig into something new.

nemphys commented 10 years ago

Concerning ModgoDB: since all benchmarks I have seen comparing Redis with MongoDB say that Redis should be much faster, I would appreciate it if you provided some more specific test results (significantly better sound really good :-) )

Furthermore, I noticed that although CRedis imitates native pipelining by spooling the commands in memory and sending them as one command to the server, it does so only when in native-php mode and not when using PHPRedis. After digging a little, I found out that PHPRedis has not yet implemented using Multi and pipelining simultaneously; it just defaults to multi and can be set to use pipeline alternatively.

Therefore, all $this->_redis->pipeline()->multi() calls are just multi() calls in reality.

Is there any specific reason you have not followed the native-php logic when using phpredis? Because it should speed things up a little, especially on save calls where the multi batch contains 4-5 separate commands. If there is not, say the word and I will implement it asap.

colinmollenhour commented 10 years ago

Re: MongoDb: Here is the basic test I did. Results are repeatable. http://screencast.com/t/fAYiQIz0aj

On pipeline+multi I am not sure, it was a long time ago that I wrote that part.. Definitely would accept a PR if it improves the performance. Please include unit tests if applicable.

Thanks again for your valuable contributions!

nemphys commented 10 years ago

Checked the tests, impressive read/write performance, though the clean performance is a bit disappointing.

As for the pipeline/multi thing, I must have been smoking something, since it can 't be implemented just like that in PHP if it's not implemented by PHPRedis (there is a feature request for such functionality, but it's 2 years old so my hopes are not very high) :-).

Investigating this matter, though, led me to the discovery of what seems to be a major flaw when using phpredis mode: since CRedis just passes pipeline/multi commands to phpredis and both treat them as contradictory (either one or the other is applied, never both), all pipeline()->multi() calls in Cm_Cache_Backend_Redis are actually just pipeline calls and not transactional (multi) calls. CRedis, when in phpredis mode, calls phpredis pipeline command on the pipeline() call (which makes phpredis enter pipeline mode) and then just returns itself on the multi() call, since it is already in isMulti() == true mode.

I hope I am missing something here, otherwise all save/clean/etc. calls in Cm_Cache_Backend_Redis are not atomic when using phpredis. That can definitely lead to data inconsistencies and might be the culprit for many of the issues discussed in this thread.

Please check the code yourself and, if you agree, propose a way to solve this (maybe distinct between php/phpredis mode and make just a multi() call in the second case).

One drawback of this whole discovery is that we are already using pipeline and therefore the performance should already be as good as it gets. Changing to multi() calls will definitely lead to worse performance, unless combining pipeline and multi is implemented by phpredis...

colinmollenhour commented 10 years ago

Since redis is single threaded, pipeline should be as atomic as multi, only without support for watch. So it doesn't make sense to use pipeline and multi together.. Should be just pipeline since watch is not used. Good catch, though!

nemphys commented 10 years ago

This is not correct; single threaded does not mean that it will execute pipeline commands atomically, you can confirm that on every single article that talks about redid pipelining, including the official redis site.

Also, it does make sense to use pipeline and multi together, since plain multi performance is close to normal (non transactional calls), whereas their combination brings performance close to that of pipelining. If you are still not convinced, check this article: http://www.terminalstate.net/2011/05/redis-pipelines-and-transactions.html

I have already taken a look at phpredis code, but my C knowledge is close to zero and I don't think I stand a chance implementing this by myself. I have seen Nicolasff (author of phpredis) admitting that it would be nice to implement the option to run pipelined transaction (https://github.com/nicolasff/phpredis/issues/166 and in some other forums), but it is supposed to be "work in progress" for 2 years now...

colinmollenhour commented 10 years ago

Ahh, I stand corrected re pipelining not being atomic. Sounds like order of TCP packets is order of execution rather than order of full requests.

So, I agree, it would be great for phpredis to support pipeline and multi together. :)

nemphys commented 10 years ago

I suppose we have to use a lua script then... :-) Just uploaded PR #55, please check!

colinmollenhour / Cm_Cache_Backend_Redis

Cache keys go missing from tags over time #33