colinmollenhour / Cm_Cache_Backend_Redis

A Zend_Cache backend for Redis with full support for tags (works great with Magento)
Other
389 stars 142 forks source link

High concurrency race condition? #98

Closed crasx closed 8 years ago

crasx commented 8 years ago

Hi, I've been having an insane issue with redis that I can't seem to figure out. When load testing my site with redis sometimes magento seems to get stuck in a loop when caching the global config. Doing a redis monitor I can see the following happen:

del CONFIG_GLOBAL_STORES_DEFAULT hget CONFIG_GLOBAL_STORES_DEFAULT hmset CONFIG_GLOBAL_STORES_DEFAULT sadd CONFIG ONFIG_GLOBAL_STORES_DEFAULT hget CONFIG_GLOBAL_STORES_DEFAULT del CONFIG_GLOBAL_STORES_DEFAULT hget CONFIG_GLOBAL_STORES_DEFAULT hmset CONFIG_GLOBAL_STORES_DEFAULT

It seems to get stuck in this loop until something "wins". Then about 20 seconds later it happens again. This only happens when the server is under a load and of course causes a huge performance hit. Any idea what could be happening?

I have tried

This is happening on php 5.5 with opcache on. one app server with varnish. Mage EE 1.14.1.0. no fpc

Redis stats: evicted keys - 0 expired keys - 229 keyspace_hits:3521947 keyspace_misses:280737 used_memory:72253024 used_memory_human:68.91M used_memory_rss:160272384 used_memory_peak:77898896 used_memory_peak_human:74.29M used_memory_lua:36864

ip
<port>6379</port>
<persistent>1</persistent>
<password></password>
<force_standalone>0</force_standalone>
<connect_retries>4</connect_retries>
<read_timeout>10</read_timeout>
<automatic_cleaning_factor>0</automatic_cleaning_factor>
<compress_data>1</compress_data>
<compress_tags>1</compress_tags>
<compress_threshold>20480</compress_threshold>
<compression_lib>gzip</compression_lib>
<use_lua>0</use_lua>

Any help would be greatly appreciated. I've been toying with the idea of adding a slow backend but I haven't done that before and I'm grasping at air here. This was working a year ago but since then we have added varnish and upgraded magento. I have been using local file cache for now but I can have up to 50 app servers which isn't fun to manage with no centralized cache. If you need anymore info please let me know

thanks!

colinmollenhour commented 8 years ago

The issue is not with the backend but with Magento's very poor locking mechanism. I'd have already submitted a PR for this long ago if there was a place to submit it to, but here is the modified Mage_Core_Model_Config that I'm using which fixes the stampeding issues very nicely. I can almost guarantee this will fix your issue:

https://gist.github.com/colinmollenhour/f7afefdf5b227f8cc677

Note, IMO on production you should never flush the config cache, only refresh it. I think there is another script in my gists that does this.

Also make sure that you don't have any third party extensions that are clearing the config cache unnecessarily. I have seen this all too many times.

crasx commented 8 years ago

Awesome thanks! That seemed to do it. Have you sent this to magento at all? I have a EE ticket open with them about this and will add a link to thar

How would I find extensions that clear the cache? I am not triggering it manually so something else must be doing it

colinmollenhour commented 8 years ago

Yes, I think I mentioned it on a M2 issue, but can't remember how it was left but nothing came of it. You'll probably have more luck being an EE customer.

Use your IDE or grep or similar to search for things like 'cleanCache', 'cleanType', etc..

crasx commented 8 years ago

Hi Colin,

The fix has worked great for the core cache but it seems like it doesn't work for all the other caches. I have a client that uses the admin ui to clear cache during peak times and after they do the site performance tanks. From newrelic it looks like some pages use the core cache, but then fail to use it for random layouts or entities. I was able to resolve this issue during a low period time by refreshing the cache multiple times. I assume I just got lucky a couple times as I am unable to do it today (black friday). Is there any way to fix this behavior? I have tried using your cache clear script and setting use_lua to 1.

Thank you

colinmollenhour commented 8 years ago

Maybe mod the button on the admin UI to use the clearCache.php method? :) I don't have any special tricks for the block and layout cache, but I recommend finding out why they are using the backend to do cache flushes and fix the blocks' tags so that invalidation happens automatically and only when necessary.

mpchadwick commented 6 years ago

@crasx this is an old thread but I'm wondering if EE support supplied a patch for this and if so if you know the SUPEE number for it?

Mramir-bounteous commented 6 years ago

@mpchadwick Sadly I no longer work for the client this was for. I'll pass this on to an old colleague to see if he has any more details about it

josephdpurcell commented 6 years ago

Unfortunately, I don't know the status of this issue in Magento. If anyone can find a referenced issue, please report back!

hmphu commented 6 years ago

Me too, I don't know the status of this issue in Magento version 2.x :(

colinmollenhour commented 6 years ago

The solution is given in my first comment.