buddycloud / buddycloud-media-server

Share files in a buddycloud channel
http://buddycloud.com
28 stars 16 forks source link

Caching media! #24

Closed rodrigods closed 10 years ago

rodrigods commented 11 years ago

Media should be cached to avoid unnecessary requests and increase the webclient loading speed. Any opinions on what should be the cache headers returned by the media server?

good reference: http://www.mnot.net/cache_docs/

imaginator commented 11 years ago

Looking at what Flickr sends with photos:

GET /7062/6952224647_cdc8eb52aac.jpg HTTP/1.1 Host: farm8.staticflickr.com User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0 Accept: image/png,image/;q=0.8,_/*;q=0.5 Accept-Language: en-gb Accept-Encoding: gzip, deflate Connection: keep-alive Referer: https://secure.flickr.com/photos/fahrertuer/6952224647/in/pool-buddycloud/

replies

HTTP/1.1 200 OK Date: Tue, 11 Dec 2012 14:05:05 GMT Content-Type: image/jpeg Content-Length: 208626 Connection: keep-alive P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV" Cache-Control: max-age=315360000,public Expires: Mon, 12 Dec 2022 00:12:45 UTC Last-Modified: Sun, 04 Mar 2012 14:39:40 GMT Accept-Ranges: bytes X-Cache: MISS from photocache805.flickr.bf1.yahoo.com, MISS from cache420.flickr.ch1.yahoo.com X-Cache-Lookup: MISS from photocache805.flickr.bf1.yahoo.com:84, MISS from cache420.flickr.ch1.yahoo.com:81 Via: 1.1 photocache805.flickr.bf1.yahoo.com:84 (squid/2.7.STABLE9), 1.1 cache420.flickr.ch1.yahoo.com:81 (squid/2.7.STABLE9)

Can I recommend that you have a setting in the media server config file for a cache timeout in seconds with defaulting to 86400 (1 day).

S.

On 11 December 2012 12:34, rodrigods notifications@github.com wrote:

Media should be cached to avoid unnecessary requests and increase the webclient loading speed. Any opinions on what should be the cache headers returned by the media server?

good reference: http://www.mnot.net/cache_docs/

— Reply to this email directly or view it on GitHubhttps://github.com/buddycloud/buddycloud-media-server/issues/24.

Simon Tennant | buddycloud.com | +49 17 8545 0880 | office hours: goo.gl/tQgxP

imaginator commented 11 years ago

One of the tips on that page was:

Generate Content-Length response headers. It’s easy to do, and it will allow the response of your script to be used in a persistent connection. This allows clients to request multiple representations on one TCP/IP connection, instead of setting up a connection for every request. It makes your site seem much faster.

So if that is possible, let's also pass Content-Length to clients.

On 11 December 2012 15:20, Simon Tennant simon@buddycloud.com wrote:

Looking at what Flickr sends with photos:

GET /7062/6952224647_cdc8eb52aac.jpg HTTP/1.1 Host: farm8.staticflickr.com User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0 Accept: image/png,image/;q=0.8,_/*;q=0.5 Accept-Language: en-gb Accept-Encoding: gzip, deflate Connection: keep-alive Referer: https://secure.flickr.com/photos/fahrertuer/6952224647/in/pool-buddycloud/

replies

HTTP/1.1 200 OK Date: Tue, 11 Dec 2012 14:05:05 GMT Content-Type: image/jpeg Content-Length: 208626 Connection: keep-alive P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV" Cache-Control: max-age=315360000,public Expires: Mon, 12 Dec 2022 00:12:45 UTC Last-Modified: Sun, 04 Mar 2012 14:39:40 GMT Accept-Ranges: bytes X-Cache: MISS from photocache805.flickr.bf1.yahoo.com, MISS from cache420.flickr.ch1.yahoo.com X-Cache-Lookup: MISS from photocache805.flickr.bf1.yahoo.com:84, MISS from cache420.flickr.ch1.yahoo.com:81 Via: 1.1 photocache805.flickr.bf1.yahoo.com:84 (squid/2.7.STABLE9), 1.1 cache420.flickr.ch1.yahoo.com:81 (squid/2.7.STABLE9)

Can I recommend that you have a setting in the media server config file for a cache timeout in seconds with defaulting to 86400 (1 day).

S.

On 11 December 2012 12:34, rodrigods notifications@github.com wrote:

Media should be cached to avoid unnecessary requests and increase the webclient loading speed. Any opinions on what should be the cache headers returned by the media server?

good reference: http://www.mnot.net/cache_docs/

— Reply to this email directly or view it on GitHubhttps://github.com/buddycloud/buddycloud-media-server/issues/24.

Simon Tennant | buddycloud.com | +49 17 8545 0880 | office hours: goo.gl/tQgxP

Simon Tennant | buddycloud.com | +49 17 8545 0880 | office hours: goo.gl/tQgxP

orrc commented 11 years ago

I noticed this too when trying the demo. Even just sending a Last-Modified header and/or ETag header for /media/avatar would be really helpful.

I'd say "citation needed" on the Content-Length thing, as the media server is using HTTP keep-alive, and I would have thought that actually getting the data onto the wire is faster than calculating its size and then pushing it onto the wire.

The media server is already sending Transfer-Encoding: chunked, so I don't see how adding the length would help...

orrc commented 11 years ago

Also, am I right in saying there are two classes of media: static and dynamic(ish)? e.g. Photos uploaded vs. avatars?

As you can see from Flickr, whose photos never (ever?) change, they set an Expires header of 10 years. That wouldn't be preferable for avatars which, presumably, people can change.

So it's maybe worthwhile having two different classes of expiration.

imaginator commented 11 years ago

So I think we even store the content length in the database since we need that for doing a storage quota later on - so reading it off with other metadata is no biggie. Nevertheless, this is all optimising- as long as we can set some of the familiar headers that influence caching, we're golden.

Perhaps the "long-term vs short term" media thing could be solved with ETAGs?

On 13 December 2012 18:51, Christopher Orr notifications@github.com wrote:

Also, am I right in saying there are two classes of media: static and dynamic(ish)? e.g. Photos uploaded vs. avatars?

As you can see from Flickr, whose photos never (ever?) change, they set an Expires header of 10 years. That wouldn't be preferable for avatars which, presumably, people can change.

So it's maybe worthwhile having two different classes of expiration.

— Reply to this email directly or view it on GitHubhttps://github.com/buddycloud/buddycloud-media-server/issues/24#issuecomment-11344922.

Simon Tennant | buddycloud.com | +49 17 8545 0880 | office hours: goo.gl/tQgxP

orrc commented 11 years ago

Probably you can't store content length since avatars are usually requested with dynamic sizes?

Anyway, I believe ETags should always be set. Probably you just need to set a shortish Expires header for avatars (i.e. a day to a week) as they can change, but not often. I don't think you can do much more about this, as the URLs need to remain stable. Other static media can Expire in years.

But should all this be a feature of the media server, or is it intended that a proxying server sits in front which could easily handle all this?

rodrigods commented 11 years ago

I agree with the short term vs long term approach. I was trying to do some tests here, but seems that the HTTP API is overriding the headers that are being sent by the Media Server.

Trying to fix this right now.

imaginator commented 11 years ago

I'd forgotten about dynamic sizes... you are right.

Checking to see what other service send... Facebook doesn't use etags, G+ and Twitter do.

Facebook:

Request URL:https://fbcdn-profile-a.akamaihd.net/hprofile-ak-prn1/27396_1011483959_3799_q.jpg Request Method:GET Status Code:304 Not Modified Request Headersview source Accept:/ Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3 Accept-Encoding:gzip,deflate,sdch Accept-Language:en-US,en;q=0.8 Cache-Control:max-age=0 Connection:keep-alive Host:fbcdn-profile-a.akamaihd.net If-Modified-Since:Fri, 01 Jan 2010 00:00:00 GMT Referer:https://www.facebook.com/ User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11

Response Headers Access-Control-Allow-Origin:* Cache-Control:max-age=1209600 Connection:keep-alive Content-Type:image/jpeg Date:Thu, 13 Dec 2012 19:18:37 GMT Expires:Thu, 27 Dec 2012 19:18:37 GMT Last-Modified:Fri, 01 Jan 2010 00:00:00 GMT

G+

Request URL:https://lh3.googleusercontent.com/-uyn3bznWdAM/AAAAAAAAAAI/AAAAAAAAAAA/g5GamsPx7Qw/s32-c-k/photo.jpg Request Method:GET Status Code:200 OK Request Headersview source :host:lh3.googleusercontent.com :method:GET :path:/-uyn3bznWdAM/AAAAAAAAAAI/AAAAAAAAAAA/g5GamsPx7Qw/s32-c-k/photo.jpg :scheme:https :version:HTTP/1.1 accept:/ accept-charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3 accept-encoding:gzip,deflate,sdch accept-language:en-US,en;q=0.8 referer:https://plus.google.com/ user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11

Response Headers access-control-allow-origin:* cache-control:public, max-age=86400, no-transform content-disposition:inline;filename="" content-length:850 content-type:image/jpeg date:Thu, 13 Dec 2012 19:21:17 GMT etag:"v7e" expires:Fri, 14 Dec 2012 19:21:17 GMT server:fife status:200 OK version:HTTP/1.1 x-content-type-options:nosniff x-xss-protection:1; mode=block

Twitter:

Request URL:https://si0.twimg.com/profile_images/69997926/n631641110_3342_normal.jpg Request Method:GET Status Code:304 Not Modified Request Headersview source Accept:/ Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3 Accept-Encoding:gzip,deflate,sdch Accept-Language:en-US,en;q=0.8 Cache-Control:max-age=0 Connection:keep-alive Host:si0.twimg.com If-Modified-Since:Wed, 07 Jan 2009 11:15:47 GMT If-None-Match:"6d346144e3f8cd7ca9bab728baddfee8" Referer:https://twitter.com/ User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11

Response Headers Accept-Ranges:bytes Cache-Control:max-age=31536000 Date:Thu, 13 Dec 2012 19:24:07 GMT ETag:"6d346144e3f8cd7ca9bab728baddfee8" Expires:Fri, 13 Dec 2013 19:24:07 GMT Last-Modified:Wed, 07 Jan 2009 11:15:47 GMT Server:ECS (fra/D5B0) x-amz-id-2:1NSwRZy29+ojjlII+HMprRzOE3OUfIYNJtK0KMSDpZjHIRj/FmKJvCTQUnmtlN3y x-amz-request-id:7D149D89ED6A5575 X-Cache:HIT

orrc commented 11 years ago

The profile URLs of the other services aren't fixed -- they contain a bunch of unique identifiers, allowing them to have much longer caching times, e.g. one year for Twitter, two weeks for Facebook (though only one day for G+).

Once you change your profile image on those services, it gets assigned a new URL with some new random unique identifier. So, next time you hit Facebook it writes that new URL into the HTML, and people see the new profile image ASAP. The old URL and its image never get used again, probably lies on disk for a while, eventually being deleted.

However, with Buddycloud and its loosely coupled components, the media URLs have to remain stable — the web client (or other consumers) doesn't know about any special caching or naming schemes of the media server. In this case, I believe the best you can do is tell browsers to cache /avatar for a day or so, and other un-changing media for much much longer.

In any case, using ETags would be a good start. While that doesn't stop browsers from querying the media server on every page load, it allows the server to return "304 Not Modified" and the browser doesn't have to waste time/bandwidth download anything.

Adding the Expires header is the next step, which prevents the browser from even bothering to ask the media server whether anything has changed, so long as the expires date hasn't passed.

rodrigods commented 10 years ago

Media server is now using the last-modified / if-modified-since approach, example:

$ curl -I "https://demo.buddycloud.org/api/c456@buddycloud.org/media/avatar" --insecure
HTTP/1.1 200 OK
Date: Wed, 22 Jan 2014 12:20:51 GMT
Server: buddycloud media server
X-Powered-By: Express
Access-Control-Allow-Origin: undefined
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, PUT, DELETE
Access-Control-Allow-Headers: Authorization, Content-Type, X-Requested-With, X-Session-Id
Access-Control-Expose-Headers: Location, X-Session-Id
content-length: 28173
content-type: image/jpeg
last-modified: Thu, 09 Jan 2014 19:57:04 GMT
accept-ranges: bytes
vary: Accept-Charset,Accept-Encoding,Accept-Language,Accept
Cache-Control: max-age=3600
Expires: Wed, 22 Jan 2014 13:20:51 GMT

$ curl -I --header 'If-Modified-Since: Thu, 09 Jan 2014 19:57:04 GMT'  "https://demo.buddycloud.org/api/c456@buddycloud.org/media/avatar" --insecure
HTTP/1.1 304 Not Modified
Date: Wed, 22 Jan 2014 12:22:12 GMT
Server: buddycloud media server
Expires: Wed, 22 Jan 2014 13:22:12 GMT
Cache-Control: max-age=3600

This would enable the clients to invalidate old images as soon they are updated. Another issue cited here is about the content-length header, which was already fixed: https://github.com/buddycloud/buddycloud-media-server/commit/c8ec428f36dd42ca685df18e3c13d9ce1f21e6bb