Closed shevron closed 3 years ago
From my further analysis, (3) and (4) above are not trivial to implement, as they require substantial changes in ckanext-authz-service
and the way scopes are coupled to entities.
Because a sha256 is not something we can look up a resource by (currently it is an "extra" attribute and not indexed), and the authorization functions will need to somehow fetch the resource to check if the user has access or not, we'll need to find a way to parse obj:<sha256>
into something we can use to fetch and check resource permissions against.
scope_normalizers
in Authzzie
which can normalize granted scopes. More on this approach down below. obj
scopes can mangle requests for res:<org>/<dataset>/<sha256>:read
to something like res:*/*/<sha256>
. lfs_prefix
to send the batch request but not to get the auth token when downloadingThe scope normalizer based quick fix seems to be solid, and I'm moving on with that. Some minor changes were required in Giftless (see https://github.com/datopian/giftless/pull/61), so Giftless under 0.3.0 will not work with this fix.
Right now, when a dataset is moved to a different organization, or is renamed, or the organization is renamed, authorization will break down and the resource will no longer be available for download.
To reproduce:
Analysis
In
get_authz_token()
we obtain an authorization token to download resources of the organization / dataset who's name is saved inlfs_prefix
.lfs_prefix
.Potential fixes:
Switching to store using UUIDs and not names: will only partially solve the problem, won't help if a dataset is moved between organizationsWriting custom resource authorization handler in
ckanext-blob-storage
that authorizes based on the actual resource, then provides a token forlfs_prefix
- may work but not sure this is possible given currentckanext-authz-service
API which may not let set a custom scope for a resource auth request. This will require decoupling the context dataset / organization package the scope is requested for, from the scope string itself in some way - that is some work onckanext-authz-service
.Add support for object-specific scopes in Giftless and generate / use these kind of scopes in
ckanext-blob-storage
. For example, instead of the current:obj:my-org/my-dataset/*:read
tokens that we use to get download access (in whichmy-org/mydataset
come fromlfs_prefix
), we generate tokens that look likeobj:<sha256>:read
. This will need to be supported by Giftless first (this will require slight modifications to Giftless authorization code). Then add generation of such tokens inckanext-blob-storage
. This may also require some modification inckanext-authz-service
to make scope formats more flexible, as with 2 above. I kind of prefer this to 2 as we'll end up with slightly cleaner scopes. Downloading will still require us to keeplfs_prefix
and use it for download batch requests (but not in the JWT token).Do 3 but also do away with the hierarchical storage structure in Giftless entirely (or at least make it optional), so that all objects are accessible without
lfs_prefix
and just require sha256 + size. This will be the cleanest solution but will require the most refactoring on all ofgiftless
,ckanext-blob-storage
andckanext-authz-service
. Benefits: no need to keeplfs_prefix
around, as long as an object's sha256 doesn't change you can read it (if it changes it's not the same object...). Download scopes will need to be for a specific sha256. Upload tokens - need to do more analysis but probably you can always upload (assuming you have write access to anywhere in CKAN). Overwriting objects is not possible with Giftless anyway ("should not happen":tm:) This also adds the benefit of de-duplicating uploads across all objects, not just if they happen to share organization / dataset. This requires some deeper analysis but is most likely the cleanest, most robust but also more expensive solution.