apache / incubator-pagespeed-ngx

Automatic PageSpeed optimization module for Nginx
http://ngxpagespeed.com/
Apache License 2.0
4.36k stars 363 forks source link

Support Redis as caching backend #777

Closed CAFxX closed 8 years ago

CAFxX commented 10 years ago

ngx_pagespeed does not currently allow to use Redis as a cache backend: are there any plans to support it?

jeffkaufman commented 10 years ago

While I haven't tried Redis with PageSpeed, I thought it spoke a superset of the memcached protocol, and you could use memcached libraries with it? If so you can just configure PageSpeed's memcached support against your Redis server and it should work.

If that doesn't work, you might be able to make twemproxy convert between the two formats for you; it claims to be able to talk to both Redis and memcached.

jeffkaufman commented 10 years ago

If you do test this out, could you let us know how it worked?

fcorriga commented 10 years ago

redis protocol is not compatible with memcached. it needs a proper implementation. using a proxy will not change the protocols incompatibility and also will hurt latency...

jeffkaufman commented 10 years ago

We use apr_memcached internally to speak memcached for us. Is there something similar for redis? What is the standard C library that people use when writing redis clients?

fcorriga commented 10 years ago

clients/libs are available here: http://redis.io/clients i think hiredis is the official one

jeffkaufman commented 10 years ago

hiredis looks pretty good, and like something that would integrate will with pagespeed.

The next step would be for someone to look at system/apr_mem_cache.cc and make something similar for redis. I think we would want to use their async api; it's a bit trickier but it fits in much better with our structure.

fcorriga commented 10 years ago

sorry, actually i don't know anyone that can do that

jeffkaufman commented 10 years ago

@fcorriga Sorry, I wasn't saying it would have to be you. I was just trying to put enough information out there that if someone does come along who is interested in doing this work they'll have a place to start.

As for someone doing the work here, there are lots of things we need to work on and I'm not sure when we'll get to it.

iamskok commented 9 years ago

Any updates ?

jeffkaufman commented 9 years ago

No one has been working on this, and it's not currently on anyone's schedule.

If someone is looking to get into PageSpeed development this would be a great bug to work on!

jeffkaufman commented 9 years ago

Added to Accepted Feature Requests.

mariarti commented 9 years ago

+1

ph4r5h4d commented 9 years ago

+1

josuegio commented 9 years ago

+1

A-David commented 9 years ago

Any updates?

jeffkaufman commented 8 years ago

It looks like this will probably be an intern project for this summer.

xjunior commented 8 years ago

Thanks for the response, @jeffkaufman

isaumya commented 8 years ago

+1 Looking forward to this fix

This is truly a very important thing...

akincansenol commented 8 years ago

I am waiting impatiently for!

ghost commented 8 years ago

Looking forward for this! Imagine the impact of this two working together.

isaumya commented 8 years ago

Any update yet?

jeffkaufman commented 8 years ago

We have an intern who's going to work on this starting about three months from today.

ph4r5h4d commented 8 years ago

It would be great if the focus will be on Redis Cluster , a single pipe caching mechanism can be achieved by Memcached for now and it's ok until we use two or more server and in case one of them get out of service Pagepseed don't handle the situation nice and in some areas it takes time for a page to load due to inability to connect to a certain Memcached server.

I believe if Redis is going to be supported Redis Cluster should be in mind beside normal Redis instance.

nathanjosiah commented 8 years ago

It has been nearly a month since this feature was reported to be worked on, is there any update or ETA @jeffkaufman?

yeputons commented 8 years ago

@nathanjosiah I'm the intern working on that. First commit has already landed in mod_pagespeed, but there is still a lot of work - e.g. Redis adapter is not integrated with the rest of code yet.

I will update here whenever master branch of mod_pagespeed has something minimal that you can actually play with.

nathanjosiah commented 8 years ago

Thank you for this update @yeputons! Are you thinking this will land within weeks or months?

yeputons commented 8 years ago

@nathanjosiah I cannot promise anything, but I expect it to land within a month or so.

isaumya commented 8 years ago

Thanks a lot @yeputons for working on this issue. Really huge thanks. Last time I checked, every time I turned on Redis Cache, pagespeed stopped working. I am using @centminmod on my server. I hope after you add the redis feature into pagespeed I can use both of them simultaneously. :)

jmarantz commented 8 years ago

FWIW I thought I'd paste a few searches showing memcached's implementation in mod/ngx_pagespeed so folks understand the amount of work required for this:

https://github.com/pagespeed/mod_pagespeed/search?utf8=%E2%9C%93&q=memcache&type=Code https://github.com/pagespeed/mod_pagespeed/search?utf8=%E2%9C%93&q=memcached&type=Code https://github.com/pagespeed/mod_pagespeed/search?utf8=%E2%9C%93&q=AprMemCache&type=Code

I don't know how to express that with one search, and I'm not sure those three searches capture the depth of the integration effort :)

I'm curious, though, if someone can articulate why they prefer redis over memcached for mod_pagespeed's use. I just skimmed through http://www.infoworld.com/article/3063161/application-development/why-redis-beats-memcached-for-caching.html which is obviously pro-redis . However I think to exploit redis' advantages, mod_pagespeed would need to have much more extensive changes about how it interacts with caches, which would extend way beyond an intern project.

Of course, a perfectly good reason might just be "our site cache everything else with redis, so we don't want to add memcached just for mod_pagespeed". But I am not catching what metrics would improve by replacing mod_pagespeed/memcached with mod_pagespeed/redis.

[edit: I think maybe one compelling advantage is checkpointing the redis contents to disk on shutdown, so that machine reboot != cache flush]

jmarantz commented 8 years ago

@isaumya can you give more detail on what failed when you ran mod_pagespeed and redis at the same time? Since as of today mod_pagespeed is not integrated to redis, I'm not sure whether you are describing a functional problem with your net stack and caching at various layers (e.g. Redis caches unoptimized HTML), or whether it's a resource usage issue where combining MPS & Redis on one physical machine loads it down too much.

If it's a caching semantics issue, you might want to look at https://developers.google.com/speed/pagespeed/module/downstream-caching

isaumya commented 8 years ago

@jmarantz Hi. Thanks for your reply. I use nginx_pagespeed not mod_pagespeed. As I stated above I use @centminmod (http://centminmod.com/) on my server. Now centminmod has an inbuilt option to enable redis cache. But every time I enable it I see the css/js combination/minification, image optimization, webp conversion is not working anymore. I just see the normal HTML without any pagespeed magic.

P.S.: I'm running a wordpress site.

jmarantz commented 8 years ago

I see...I think the Redis integration we are implementing right now might be somewhat orthogonal to your particular issue, which appears related to https://developers.google.com/speed/pagespeed/module/downstream-caching -- please take a look at that :)

I'm curious whether other people on this thread are expecting to get out of a Redis integration.

JoyceBabu commented 8 years ago

Of course, a perfectly good reason might just be "our site cache everything else with redis, so we don't want to add memcached just for mod_pagespeed".

I am in that group. Currently I am using both memcached and Redis. Memcached is used only by MPS. If I can replace Memcached with Redis, I would prefer that, as long as there is no performance degradation.

I think maybe one compelling advantage is checkpointing the redis contents to disk on shutdown, so that machine reboot != cache flush

I faced this problem today. My memcached server was up for the past 30 days, and I was seeing a large number of cache evictions per second (around 10/s). So, I decided to raise the cache size to 8GB and searched for a solution to do that without clearing the cache. The only option I could find was to use memdump to dump the keys and memcat to dump the values. Since I was worried that the cache might get corrupted during the export/import and it might end up serving corrupted files, I decided not to go for it. So, I went ahead and restarted the server without exporting, and watched the 6 million cached items vanish đŸ˜’ . If it was Redis, I could have used SAVE command to save the data to disk.

justbeez commented 8 years ago

I'm curious, though, if someone can articulate why they prefer redis over memcached for mod_pagespeed's use.

@jmarantz TLDR; when used like Memcached, Redis is as fast (faster if you ask some)—but it also has a slew of other features that are important to many use cases, and some of which have big out-of-the-box benefits to *_pagespeed.

For us, obviously it's nice just because we're already using Redis (who wants to deal with both, especially when you're paying to make them HA), but I think this comes down much more to why we're already using Redis over Memcached. That breaks down into a few areas:

  1. The ability to persist to disk (as already mentioned). Don't need this in your pagespeed use case? Great; turn it off and get a performance benefit.
  2. Better support for horizontal scaling (including ease of setting up HA/cross-region failover, which is an out-of-the-box feature through clustering).
  3. More control over eviction policies (Memcached's LRU may very well discard an expensive object just because it's a similar size or it just feels like it). This added control could also allow more robust purging to be integrated somewhat transparently.

There's a lot more that you can benefit from, like optimistic locking in transactions, data larger than 1MB per key, simpler administration and reporting . . . there's a ton of features that just aren't present in the Memcached universe, or require you to bolt on a third-party solution. When it comes down to it, Redis gives me a ton of options and puts me in control of what's best for my workflows.

(Reminder to anyone reading this, and before anyone jumps on the "specialization" bandwagon: every feature has a cost—and if you turn on all the bells and whistles in Redis, it's going to be slower than Memcached. If you're the sysadmin, be responsible and make sure the benefits outweigh the costs in your environment.)

yeputons commented 8 years ago

Hi everyone, work is in progress. We're currently discussing what features of Redis sharding/replication we want to support and we need your opinion on what you're going to use and why.

I believe it's possible to configure Redis in following four ways:

  1. Single server, no sharding, no replication.
  2. One master, multiple slaves - no sharding, but replication is available. However, it's not yet clear how exactly we will want to exploit that, because all writes should go to master if we want them to be visible. We do not do any hard-to-process read requests. Another potential problem is having some delay between writing and being able to read that change from slave.
  3. One master, multiple slaves (same as above), plus Redis Sentinel for failover.
  4. Redis Cluster - solution for both sharding and replication.

We definitely want to support no. 1 as the "basic" configuration and we have seen requests for no. 4. Is there anyone who is going to use no. 2 or 3? If yes, why you do not use Redis Cluster instead?

It also looks to me that Redis Cluster is superior to Redis Sentinels, can anyone confirm/disprove that?

fcorriga commented 8 years ago

Redis cluster is more reliable than master/slave solutions, plus reads/wrires are in sync. IMHO, 2 and 3 should be avoided at all costs, since there are write delays... except for high avaliability. However, you could always start supporting 1 and 4 and add the other solutions if anyone needs master/slave setups.

ph4r5h4d commented 8 years ago

For a high traffic website normally redis cluster for session and cache is the best solution , it would be great if pagespeed supports clustering too , for a low traffic or a single web server which normally redis resides inside the server , no1 is the choice. I do agree with @fcorriga for 1 & 4.

mariarti commented 8 years ago

"+" for 1 & 4

justbeez commented 8 years ago

Thanks again for the work being done here, and for the feedback everyone chipping in with!

I agree with focusing support on cases 1 and 4 (since they're by far the most common I've seen), and then address any other cases in the future if/when the community is able to articulate the need and help define reasonable implementation details.

Setting up HA Redis is complex, and most people I know are using cloud providers or manged deployments for it—so it may be wise to test the implementation using popular options like Redis Labs. They have a free tier, and the original creator of Redis is in charge of their engineering—so that would be easy for testing case 1 and we might be able to get some engagement from them if needed. (I've also looked into Redis options from ObjectRocket, RedisToGo, ElastiCache, Azure, Heroku, and Compose.io)

Redis Labs' paid plans add some special sauce onto Redis Cluster, but it's all API-compatible. We're currently using Redis Cloud for everything else in our setup—and I'm happy to test using our paid account as well. (We're actually on their Memcached Cloud for mod_pagespeed currently, so we'll be able to easily compare performance once the Redis feature is ready.)

yeputons commented 8 years ago

Hello everyone!

Current progress so far: ExperimentalRedisServer option became available in mod_pagespeed@e53bc94. It passes all our unit and system tests, so it probably does not break everything immediately. Note that it's Experimental for a reason and it is not ready for production yet. Say, there are no timeouts on individual cache operations yet, so if network drops some packages, webserver will hang, which is no good. Also the thing does not pass our load tests (maybe it's Redis misconfiguration, maybe it's a bug).

Stay tuned, I keep working and I'm going to make another update next Friday!

centminmod commented 8 years ago

@yeputons thanks for the update and hard work on this, very much appreciated !

yeputons commented 8 years ago

Update: timeouts on individual cache operations are almost in (they work and are tested, almost all code review comments are addressed), load tests pass (looks like default Redis is slower than default Memcache: something like 4000qps vs 3500qps on some benchmark that was found nearby :), I've got a bunch of small follow-ups to do (regarding better multi-threading, error detection and not spamming logs when something goes wrong).

I hope to finish with all that functionality and basic speed testing by the end of next week. My next priority after that will be Redis Cluster.

ph4r5h4d commented 8 years ago

@yeputons Any guess when we can test pagespeed and redis? you mentioned earlier after single instance you'll go for redis cluster , any update about that would appreciated also .

yeputons commented 8 years ago

@f4rsh4d my internship ends this Friday. Status so far:

I think @jefftk can elaborate more on our plans to release this feature (at least, single-instance version).

ph4r5h4d commented 8 years ago

@yeputons Thanks for all the hard work during this summer , i'll try and test it ASAP.

jeffkaufman commented 8 years ago

The goal is for this to go out in the next major release, which we're planning to cut in ~2 weeks. It should have single instance redis support, and will ideally have cluster support as well.

Lofesa commented 8 years ago

Hi to all First of all, thanks for you great work. @jeffkaufman and @yeputons , can i suggest the use of unix sockets to connect to redis when in standalone ? some like this (php): if ( self::$redis_port == 0) { $connect = $redis->connect( self::$redis_host); } else { $connect = $redis->connect( self::$redis_host, self::$redis_port ); } port is set to 0 and host is set to the socket path thx in advance

jeffkaufman commented 8 years ago

can i suggest the use of unix sockets to connect to redis

People interested in unix sockets should follow https://github.com/pagespeed/mod_pagespeed/issues/760

We're not likely to have that ready for the first version.

jeffkaufman commented 8 years ago

(plan for today: stress-testing and benchmarking the single-instance redis support.)

jeffkaufman commented 8 years ago

Initial benchmarking results. Apache configured with:

<VirtualHost localhost:8083>
  ServerName redis-stress.example.com

  DocumentRoot "path/to/mod_pagespeed_example"
  ModPagespeedFileCachePath "path/to/pagespeed-cache-redis"

  ModPagespeedExperimentalRedisServer localhost:6379

  ModPagespeedLoadFromFile "http://redis-stress.example.com/" \
    "path/to/mod_pagespeed_example/"
</VirtualHost>

<VirtualHost localhost:8083>
  ServerName memcached-stress.example.com

  DocumentRoot "path/to/mod_pagespeed_example"
  ModPagespeedFileCachePath "path/to/pagespeed-cache-memcached"

  ModPagespeedMemcachedServers localhost:6378

  ModPagespeedLoadFromFile "http://memcached-stress.example.com/" \
    "path/to/mod_pagespeed_example/"
</VirtualHost>

<VirtualHost localhost:8083>
  ServerName file-stress.example.com

  DocumentRoot "path/to/mod_pagespeed_example"
  ModPagespeedFileCachePath "path/to/pagespeed-cache-file"

  ModPagespeedLoadFromFile "http://file-stress.example.com/" \
    "path/to/mod_pagespeed_example/"
</VirtualHost>

siege.conf with added:

proxy-host = localhost
proxy-port = 8083

All runs interleaved: rotating round robin between redis, memcached, and file cache.

caches started with:

redis-server --port 6379 --maxmemory 64mb --maxmemory-policy allkeys-lru
memcached -m 64 -p 6378 -u memcache -l 127.0.0.1

Benchmarking cache reading with:

inititial: manual fetch of resource until optimized
siege http://[redis/memcached/file]-stress.example.com/[resource] --benchmark -t60s -c 50

Testing small reads with rewrite_javascript.js, which is 43 bytes after optimization (plus headers and things, so probably more like 400 bytes. All number are trans/sec:

redis memcached file
3336.13 3739.67 3457.31
3481.73 3531.41 3310.50
3447.95 3661.25 3248.83
3517.29 3475.72 3300.32
3397.41 3626.58 3343.16
3492.55 3574.33 3402.98
3369.77 3698.42 3316.22
3516.39 3527.76 3281.90
3378.46 3627.90 3267.39
3484.45 3618.63 3414.02

Averages, and as a percentage of file-cache (default) performance:

And here's a chart:

chart for rewrite_javascript.js

Next I'll repeat this for a large file.