datopian / ckanext-blob-storage

CKAN extension to offload blob storage to cloud storage providers (S3, GCS, Azure etc).
http://tech.datopian.com/blob-storage/
MIT License
14 stars 6 forks source link

Authorization breaks down once a dataset is moved or renamed, can't download #45

Closed shevron closed 3 years ago

shevron commented 3 years ago

Right now, when a dataset is moved to a different organization, or is renamed, or the organization is renamed, authorization will break down and the resource will no longer be available for download.

To reproduce:

  1. Set Giftless up in a manner that requires JWT based authorization to read a file
  2. Create a private dataset + resource stored in Blob Storage.
  3. Move the dataset to a new organization or rename the dataset or rename the organization
  4. Try to download the file

Analysis

In get_authz_token() we obtain an authorization token to download resources of the organization / dataset who's name is saved in lfs_prefix.

Potential fixes:

  1. Switching to store using UUIDs and not names: will only partially solve the problem, won't help if a dataset is moved between organizations

  2. Writing custom resource authorization handler in ckanext-blob-storage that authorizes based on the actual resource, then provides a token for lfs_prefix - may work but not sure this is possible given current ckanext-authz-service API which may not let set a custom scope for a resource auth request. This will require decoupling the context dataset / organization package the scope is requested for, from the scope string itself in some way - that is some work on ckanext-authz-service.

  3. Add support for object-specific scopes in Giftless and generate / use these kind of scopes in ckanext-blob-storage. For example, instead of the current: obj:my-org/my-dataset/*:read tokens that we use to get download access (in which my-org/mydataset come from lfs_prefix), we generate tokens that look like obj:<sha256>:read. This will need to be supported by Giftless first (this will require slight modifications to Giftless authorization code). Then add generation of such tokens in ckanext-blob-storage. This may also require some modification in ckanext-authz-service to make scope formats more flexible, as with 2 above. I kind of prefer this to 2 as we'll end up with slightly cleaner scopes. Downloading will still require us to keep lfs_prefix and use it for download batch requests (but not in the JWT token).

  4. Do 3 but also do away with the hierarchical storage structure in Giftless entirely (or at least make it optional), so that all objects are accessible without lfs_prefix and just require sha256 + size. This will be the cleanest solution but will require the most refactoring on all of giftless, ckanext-blob-storage and ckanext-authz-service. Benefits: no need to keep lfs_prefix around, as long as an object's sha256 doesn't change you can read it (if it changes it's not the same object...). Download scopes will need to be for a specific sha256. Upload tokens - need to do more analysis but probably you can always upload (assuming you have write access to anywhere in CKAN). Overwriting objects is not possible with Giftless anyway ("should not happen":tm:) This also adds the benefit of de-duplicating uploads across all objects, not just if they happen to share organization / dataset. This requires some deeper analysis but is most likely the cleanest, most robust but also more expensive solution.

shevron commented 3 years ago

From my further analysis, (3) and (4) above are not trivial to implement, as they require substantial changes in ckanext-authz-service and the way scopes are coupled to entities.

Because a sha256 is not something we can look up a resource by (currently it is an "extra" attribute and not indexed), and the authorization functions will need to somehow fetch the resource to check if the user has access or not, we'll need to find a way to parse obj:<sha256> into something we can use to fetch and check resource permissions against.

Scope Normalizer based Quick Fix

shevron commented 3 years ago

The scope normalizer based quick fix seems to be solid, and I'm moving on with that. Some minor changes were required in Giftless (see https://github.com/datopian/giftless/pull/61), so Giftless under 0.3.0 will not work with this fix.