Vary cache per user role

ddeboer commented 10 years ago

The CacheAuthorizationListener listener returns immediately after Symfony's security check for HEAD requests. Currently, it returns an empty response. We can make the listener more useful, though, by returning a hash based on the user's roles. That hash can then be set as a custom header. If the caching proxy is then configured to vary on that header, different responses for different user roles could be cached. Additionally, you'd still be sure each user only sees content that he has access to (because of Symfony's security check).

The user hash should be definable by the user. Perhaps from configuration (include_role or something?) but preferably by throwing an event (build_user_hash or something) that the user can add a listener for. This bundle then has its own listener with some sane defaults.

This idea is inspired by the very nice context aware caching in ez Publish. But whereas their solution requires a Varnish module to be installed (CURL), I think our solution will work with plain VCL restart.

dbu commented 10 years ago

that could be a useful thing indeed.

if the security is using custom voters that are not based on the user but something else (e.g. the symfony cmf loading some content and then deciding individually if you have the right to see this), then i think its not possible. just saying - means we need to keep that optional and still allow the plain thing.

dbu commented 10 years ago

in terms of priorization, i would push that post 1.0 unless you have a concrete need for it.

ddeboer commented 10 years ago

Well, it wouldn't hurt to always return a user hash in the HEAD response, even if Varnish isn't configured to vary on it.

Authorization with custom voters wouldn't be supported, no. Is there a Symfony2 security event that fires after all voters have been processed that we can listen to? Though it would make the listener much less efficient, of course.

This feature can wait until after 1.0.

dbu commented 10 years ago

with ez unbundling their user hash tool (see #26) i think the best option would be to simply drop this thing here. it was a quick shot in the LiipCacheControlBundle but the solution by ez sounds a lot better.

we can throw the listener into a gist somewhere or tell people to copy it from the LiipCacheControlBundle if they need it and ez is not ready.

/cc @lolautruche

ddeboer commented 10 years ago

Well, I think it makes sense to combine insights with the ezPublish guys and and include the functionality in this bundle (if they agree). After all, we're offering a generic caching solution for Symfony2 users here, and properly caching authenticated content is an important issue.

lolautruche commented 10 years ago

I personally have no objection that we merge our solution in this bundle. However, it means license change since we're in GPL-2.0 and this bundle is MIT, so we need to evaluate the impact. At worst, we could propose a separate bundle that would be complementary to this one (or a dependency).

lolautruche commented 10 years ago

ping @andrerom @bdunogier

bdunogier commented 10 years ago

Maybe the separate,depending bundle would make it easier. As long as we move forward together, it shouldn't be a problem. But no strong preference here, to be honest.

dbu commented 10 years ago

how much code is it in the end? a listener for the curl requests from varnish, and another one to append information to the responses? or is it more involved? if its a small thing, it would make more sense to keep in here, but if its complex and big, its making things harder to understand. can you point us to the current ez code for this?

lolautruche commented 10 years ago

@dbu You can figure this out here : https://github.com/ezsystems/ezpublish-kernel/pull/368 (initial PR for the feature). Complementary PR: https://github.com/ezsystems/ezpublish-kernel/pull/720

ddeboer commented 10 years ago

Let’s remember this bundle is not only about invalidation, so user-specific cache fits fine.

I agree with @dbu that if the user hashing requires much code, it may be better to do it somewhere else. As stated above, it would be good to experiment a bit with a solution that doesn’t require the the Varnish cURL module. If we can leave that out, the solution becomes simpler.

lolautruche commented 10 years ago

I'm open to any suggestion :-) Le 25 févr. 2014 12:54, "David de Boer" notifications@github.com a écrit :

As stated abovehttps://github.com/ddeboer/FOSHttpCacheBundle/issues/30#issue-26320943, it would be good to experiment a bit with a solution that doesn't require the the Varnish cURL module.

Reply to this email directly or view it on GitHubhttps://github.com/ddeboer/FOSHttpCacheBundle/issues/30#issuecomment-36000195 .

ddeboer commented 10 years ago

I’ll come back to you about this. Need to do some experimenting with Varnish first. My hope is that plain restart will be enough. ;)

dbu commented 10 years ago

thanks for the links. i think the listener part is not too complicated and might still fit (maybe use a response listener instead of setting the vary header in the controller...) but the whole role + identity stuff becomes quite involved. i think if we do any part in this bundle, it should be a more generic and simple thing that ez can then plug into with its user management. user identity goes deep into a project, its not something you happily change just for the sake of one component. (not saying ez is doing this wrong, just that we should have clear boundaries).

maybe we can even do something more generic than a user hash - it could depend on other factors than the user, in some system. like a custom vary header and a custom check to get the right variant hash for a request in varnish.

lolautruche commented 10 years ago

@dbu As for the user hash generation, the solution is generic. It's just that we have our own implementation relying on our roles. What can be possible is to provide a very generic UserIdentity, based on native named roles for instance. eZ Publish can then add its own rules.

dbu commented 10 years ago

i guess that is what i meant, we should only have a generic interface and leave the implementation to ez and other projects.

ddeboer commented 10 years ago

I’ll come back to you about this. Need to do some experimenting with Varnish first. My hope is that plain restart will be enough. ;)

All right, I've been experimenting with user hashing with plain Varnish restart, so without requiring the user to install the Varnish cURL mod. It seems to work. As I’ve used the FOSHttpCache functionality for testing caching headers, you can find my work in a branch on that repo.

ddeboer commented 10 years ago

The user hash should be definable by the user. Perhaps from configuration (~~include_role or something?~~) but preferably by throwing an event (build_user_hash or something) that the user can add a listener for. This bundle then has its own listener with some sane defaults.

Would that make sense? We could rename the event to generate_hash to be more general. Or have a configuration option hash_generator_service that the bundle user can override.

dbu commented 10 years ago

interesting. but the HEAD request will be needed for every request with cookie, right? could we also cache the hash based on the cookie by the backend? (vary on cookie, of course) a service we configure would be more in line with what we do elsewhere in this bundle. on the other hand, an event sounds interesting too, as several listeners could try to generate a hash then. what about combined hashes? that is, i have two things that differ? when would the service/event be triggered? on kernel response i guess?

lolautruche commented 10 years ago

We could rename the event to generate_hash to be more general. Or have a configuration option hash_generator_service that the bundle user can override.

We chose the hash generator approach, with a service tag, which seems more relevant than an event approach here. The hash generator is called by a request listener, on a very specific request (AUTHENTICATE, with X-User-Hash as header).

@ddeboer I looked at your code on Varnish restart, but there is something I don't understand. You seem to send pure HEAD requests, but these need to be triggered by an incoming GET request, goal being to enrich it with a X-User-Hash header so that one can make the response vary on it. Furthermore, why do you remove the cookie ? We definitely need it to figure out the user identity... Or am I missing something ?

ddeboer commented 10 years ago

could we also cache the hash based on the cookie by the backend? (vary on cookie, of course)

Yes, we can make Varnish cache those.

a service we configure would be more in line with what we do elsewhere in this bundle. We chose the hash generator approach, with a service tag

Maybe the service is indeed the easiest solution.

Furthermore, why do you remove the cookie ?

To have Varnish cache the response. By default, Varnish doesn’t cache responses with a cookie. We can also change the VCL to have Varnish cache whether there’s a cookie or not.

You seem to send pure HEAD requests, but these need to be triggered by an incoming GET request,

They are trigged. It's like this:

Client GET (with cookie) Varnish
 \/ 
Varnish HEAD to backend 
 \/ 
backend responds with hash calculated based on the cookie 
 \/ 
Varnish puts hash on the original GET request, and restarts that request 
 \/ 
if in cache, return to client; 
if not, fetch from backend first.

lolautruche commented 10 years ago

@ddeboer OK, perfect. Even better than our solution if the user hash can be cached by Varnish. However, be aware that we had a bunch of issues with caching with the whole cookie string as key, since several services (like analytics) can alter the cookie a lot, making the cache useless. Therefore we improved by only using the session ID as key. Would it be possible with Varnish as well ?

dbu commented 10 years ago

when using varnish, you want to remove cookies like that of google analytics anyways. what we usually do is something like this:

# Cleaning up cookies https://www.varnish-cache.org/trac/wiki/VCLExampleRemovingSomeCookies#RemovingallBUTsomecookies
if (req.http.Cookie) {
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(PHPSESSID)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

    if (req.http.Cookie == "") {
        remove req.http.Cookie;
    }
}

This is straight from varnish doc, looks rather involved though.

dbu commented 10 years ago

Somebody having a custom logic that differentiates on other cookies will have to adjust this.

And of course the backend must send the x-hash with the correct livetime for the cookie(s) in question, to not keep a user authenticated after his login cookie is no longer valid. (We will then have a new HEAD request to the backend for the hash, and if the cookie/session is no longer valid, will get a login failure i would assume)

joelwurtz commented 10 years ago

Hi,

We have to handle this use case in our current project, and the solution proposed seems perfect i just want to make a summary to be sure that the current workflow it's ok with everyone before proposing a PR.

1. Varnish : Request in GET (with cookie)
1. Varnish : GET -> HEAD (with cookie)

To allow caching of this part we need to add the value of session_id in the hash

1. Symfony 2 : Receives the HEAD request

A HashGenerator service is used to generate a hash base on whatever Each service tag with hash_identifier must implement an IdentityCacheInterface which allow to add a key/value to the cache, (for example we can tag the acl service to add a role: ROLE_USER to the identifiy)

This hash is returned in the content of the response, if no service is tag, response will be empty

(Like ezPublish is doing actually)

1. If varnish receives something else as a response between 200 to 300 error codes it will send the response directly to the client
1. Otherwise, varnish retry the original request (in GET), and with the hash added in vcl_hash function if not empty
1. On hit directly return the response
1. On miss send the request to the Symfony2 backend and hash in X-FOSHttpCache-Hash header if not empty

Furthermore the AuthorizationListener will be able to be configured to send the response to HEAD Request with a TTL to allow caching for the 2. part.

Is that ok ?

ddeboer commented 10 years ago

@joelwurtz More or less, yes. We can still discuss details, whether it’s better to use vcl_hash or the Vary header.

One thing I’m not yet happy about is what you described as steps 2–4. If I’m not mistaken, with our current plan, every GET by the client requires a roundtrip to the (Symfony2) backend to calculate the user hash based on the cookie. Can we solve this by having the backend set a cookie containing the user hash (e.g., user+admin) that Varnish can check for subsequent requests instead of doing a HEAD to the backend? Faking that cookie, however, would be very easy, of course.

dbu commented 10 years ago

with the Vary header, the backend needs to know the name of the bit that varnish is using. ideally only one part (either varnish or symfony) knows of names of such things, so that as little as possible can go wrong because of mismatch.

for the request, can't the header listener simply set cache headers and then varnish will cache that and see that its the same thing when using the same session?

ddeboer commented 10 years ago

can't the header listener simply set cache headers

Of course, have Varnish cache the user hash based on cookie! I guess that is what @joelwurtz meant with setting a TTL.

dbu commented 10 years ago

at work, we pondered using a memcache client plugin to store a session to authorization group mapping, but that feels somehow weird. i mention this more as a reminder that what we do should not prevent other paths if possible. (if not possible, one can just deactivate the listener and do ones own stuff when needed)

joelwurtz commented 10 years ago

@ddeboer Yes that's what i mean with TTL :)

But this will only work for the same url, pages which are not been seen cannot have a HEAD cached (as it's impossible to know for varnish if the user as the rights, unless it's having a security flaw or logic inside VCL)

I'm not a fan of a Vary on Cookie (As we need to remove the others ones etc ...) Would it be better to extract the session_id in a X-FOSHttpCache-SessionId header and make the vary on it ?

ddeboer commented 10 years ago

I agree that adding yet another component to the mix (Memcache/Redis) wouldn't make much sense by default. We could make the setting of caching headers in our listener optional, but I would opt for a very simple listener that does good things by default and can be enabled/disabled depending on the user's needs.

Would it be better to extract the session_id in a X-FOSHttpCache-SessionId header and make the vary on it ?

That's exactly how I intended that we use the Vary header.

But this will only work for the same url, pages which are not been seen cannot have a HEAD cached

Good point. We can also have a controller with route instead of a listener in this bundle. What if Varnish did a HEAD against that controller (so always the same URL)? I guess this works as the user hash is the same for all GET URLs (/articles, /articles/123). Or it may be easier to just keep the listener and do the HEAD request always to the same URL (/).

joelwurtz commented 10 years ago

Good point. We can also have a controller with route instead of a listener in this bundle. What if Varnish did a HEAD against that controller (so always the same URL)? I guess this works as the user hash is the same for all GET URLs (/articles, /articles/123). Or it may be easier to just keep the listener and do the HEAD request always to the same URL (/).

Yes but we only check Authentication in that case and not Authorization, as maybe the user have to right to see /articles/123 but not for /articles/456

IMO The only way to do this is to be able to set different url depending on the hash generated and make a mapping on Varnish but i think this is an overkill

joelwurtz commented 10 years ago

Ooops forget what i say, it's only the hash which is cached and will be different so the next request should handle authorization

(I mess up with naming thing and cache invalidation, a little bit too complex :s)

bdunogier commented 10 years ago

(I mess up with naming thing and cache invalidation, a little bit too complex :s)

Our conversations about it were full of misunderstandings at eZ too. Complex domain :) Le 29 avr. 2014 18:37, "Joel Wurtz" notifications@github.com a écrit :

Ooops forget what i say, it's only the hash which is cached and will be different so the next request should handle authorization

(I mess up with naming thing and cache invalidation, a little bit too complex :s)

— Reply to this email directly or view it on GitHubhttps://github.com/ddeboer/FOSHttpCacheBundle/issues/30#issuecomment-41699890 .

joelwurtz commented 10 years ago

Hey, i just added a first work of it (based on @ddeboer first attempt) with tests and vcl on this PR https://github.com/FriendsOfSymfony/FOSHttpCache/pull/55

If it seems ok to you, i will write the documentation and the implementation in this Bundle

joelwurtz commented 10 years ago

Implementation available in this PR #57 (in case of you don't have notifications on this project)

ddeboer commented 10 years ago

Varying cache by user role is now available through #57 and #71. Thanks to everyone involved for working on this one!

FriendsOfSymfony / FOSHttpCacheBundle

Vary cache per user role #30