Open fmigneault opened 3 years ago
For me the cache does not need to be shared or synchronized. It is pretty common on the web to have to wait a couple of seconds or minutes before a new authorization takes effect. The only problematic scenario is when running tests that don't take that into account. So for me the solution #1 is the good one with one caveat. If a service (A) calls itself another service (B), even if the no-cache header is set when calling (A), there is no guarantee that (A) will set this header while calling (B). So the B request could still hit the bad cached value. So the tests need to be carefully crafted when updating permissions so that no waterfall calls occur when setting the no-cache header or take into account the time required before the cache expires.
If service (A) calls a protected service (B), it should technically forward at least the Set-Cookie
header already (or other cookie / auth containers based on impl.) in order to gain the same access user/identity, otherwise it assumes (B) is always public. It should also forward to Cache-Control
to be sure.
I think that this is not a very common nor problematic case though. Most probably, either the endpoint called on (B) by (A) is a static path, so the endpoint shouldn't change permission, or it is a "same resource" path (eg matching WMS for a Thredds file), so permissions should be changed similarly at about the same time to reflect the desired access. The caches of both resources should also reset around the same time if last cached request successively called (A)->(B).
I find also that changing a permission on (A) from/to secure/public impacts only during that kind of "same resource" case, considering below "initial" permissions.
A (secure) | B (secure) | Both need permission update to be public, about same delay until reset cache takes effect if permissions are applied on both. Otherwise, bad permission setup. |
A (public) | B (secure) | (B) always blocked to begin with, doesn't matter that A becomes secured, because token was always required to begin with. |
A (secure) | B (public) | (B) is assumed public, it remains public. Will be callable publicly via (A) after cache reset. Making (A) public doesn't change access to (B) that could be called directly. |
A (public) | B (public) | (B) is assumed public, it remains public, just (A) that becomes blocked after cache reset. Technically, nothing wrong. (A) can still access (B), but via public. To do things correctly, permission on (B) should be changed also if they should match blocked access. |
It is indeed a edge case. I was just pointing out that one should keep that in mind when writing tests. The exact usecase I had in mind that you didn't mention was more a WPS (as service A) trying to access data (thredds as service B) or shape (geoserver as service B) or Weaver (as service A) calling other WPS (as service B). I would be surprised if our WPS forward the cache-control header.
But again, I assume that WPS tests will not include permission changes and will rather use public data/shape. It is really just a reminder how deep the cache can trick us.
It is pretty common on the web to have to wait a couple of seconds or minutes before a new authorization takes effect.
I am with DavidB on this one. I think this is a very nice to have but we do not have to resolve it right now.
Description
Because permissions are applied onto Magpie and are resolved for request-access by Twitcher instead, any already active caching of older requests will not be immediately synchronized on Twitcher side right after the permission update on Magpie side (caches are not shared, and therefore not invalidated on permission update).
Following requests to the same resources (within the caching expiration delay) will hit the cached response triggered by the access during the previous requests. New permissions resolution will not be effective until the cache expires. For example:
Note that the effect goes both ways, i.e.: removing access to a resource will not be blocked until the delay was reached.
Edit:
For the invalidation to take effect on Twitcher side, there are ~2 methods~ 3 methods:
cache-control: no-cache
header during the next file access request to enforce reset of cache. This works, but should be done only on the first call after permission update, otherwise all caching performance advantages are lost over many repeated access to the same resource.file
references to allow them to invalidate each other.cache.type = file
+ corresponding paths forcache.lock_dir
andcache.data_dir
in their INI configs.file
(onlymemory
is tested, and they are not hashed the same way for selective invalidation - e.g.: invalidate only ACL for resource X / user Y, etc.).redis
ormongodb
extension withbeaker
to synchronize caches. https://beaker.readthedocs.io/en/latest/modules/redis.html https://beaker.readthedocs.io/en/latest/modules/mongodb.html Not only would this allow to sync or invalidate caches acrossMagpie
/Twitcher
, but also between individual workers ofMagpie
andTwitcher
. At the moment, each worker holds its own in-memory cache depending on which requests it received, meaning cached/non-cached responses won't be the same (and won't expire at the same time) depending on which worker processes the request and when was the last one received by it.References
This branch: https://github.com/Ouranosinc/Magpie/tree/invalidate-perm-via-group provides an initial draft of cache invalidation following permission update. It still is missing updates for
cache.type = file
, but invalidation occurs as expected withmemory
on Magpie-side.https://github.com/Ouranosinc/Magpie/issues/462 is a temporary "resolution" description if such that we consider this caching situation a known issue and that won't be adjusted according to above shared-volume method.
Following PR: https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/pull/87 bypasses the issue by enforcing
cache-control: no-cache
header during tests. Failing tests in https://github.com/bird-house/birdhouse-deploy/pull/188#issuecomment-895562433 work without the cache.