Move to storage layout of {dataset-uuid}/{sha256}

Follow up to #45: the current blob storage approach has an issue when one moves a dataset from one organization to another (or the dataset is renamed). This is because we are storing data in blob storage at {org}/{dataset-name}/ and using the information when performing scope validation in giftless.

To avoid this, it is proposed to use <static-prefix>/<dataset-UUID> as the LFS prefix when storing new resources.

The <static-prefix> part is set in config for the whole CKAN site. Thus is not technically essential, but having it will allow us to not make any modifications to Giftless, which expects a two-part (org name / repo name) prefix.
Using dataset-UUID instead of dataset-name means the resource's container prefix will not need to be rewritten / mangled if the dataset is renamed or moved to a different organization, or if the organization is renamed.
This will allow us to drop the Scope Normalizer solution all together as scopes will always be <static-prefix>/<dataset-UUID>/<sha256>.
As a compatibility measure, we can:
- Keep lfs_prefix for now
- If lfs_prefix is set to something that doesn't look like the new prefix format, still go through scope normalization
- Run migration to move all in-storage objects to new format containers
- Stop using lfs_prefix all together, as it is not needed anymore (although we don't have to and it can be beneficial at some point, e.g. if an additional change will be required).

Tasks

[x] Change ckanext-blob-storage to use static-prefix/uuid as prefix when uploading ~2d
- [x] Need to decide if we keep lfs_prefix around or not. If we do not, need to flag resources that are in Git LFS in some other way (or rely on sha256 being set as the indicator)
- [x] Static prefix should be config based
- [x] Token handling - can be by registering a new auth handler, by using a (different) scope normalizer logic (probably easiest) or by making some adjustments to ckanext-authz-service.
- [x] Upload location - probably an easy change
[x] Ensure backwards compatibility with already-migrated resources ~1d
- e.g. by dealing with lfs_prefix in scope normalizer - not needed if we don't need BC e.g. can do migration during downtime.
[x] Write and run migration script to move resources from name-based LFS prefix to UUID based - ~2d
[ ] Deployment and testing ~1-2d

Analysis

What's the problem

Imagine i want to download the blob related to a resource ...

I get the resource metadata
I go to ckanext-authz endpoint and say: given me a token to read a resource
- The token I ask for will contain the scope: obj:myorg/mydataset/*:read - to read every resource of myorg/mydataset.
I take that token to giftless and given the token along with XXX
- How does giftless know whether it should grant access?
- It looks at the request storage object with identifier:
- POST to /myorg/myrepo/object/batch with {oid: <sha256>} - this identifies the object and can be checked against the scope in the token.
  - Can i do POST to /{prefix}/object/batch with {oid: <sha256>} and prefix can be
- and compares it with the provided scopes ...
- How does it know that storage object is covered by the scope? A scope accepted by Giftless looks something like obj:myorg/mydataset/*:read
Giftless gives me a token for the storage (a url)

Where this goes wrong is if i have moved the dataset ... because now the giftless location is still old dataset whilst scope is for new dataset ...

Options

Flat namespace /{sha256} in storage space that is pure content addressed
Scoped storage with entity UUID: /{dataset-uuid}/sha256 Preferred
Relocate data ...
Temporary solution

Quick Fix: Scope Normalizer based Quick Fix - DONE in #47

Change from obj:myorg/myrepo/sha256:read to obj:*/*/sha256:read or even obj:sha256:read

Assumption: a scope normalizer function registered in ckanext-blob-storage for obj scopes can mangle requests for res:<org>/<dataset>/<sha256>:read to something like res:*/*/<sha256>.

If this is true, we can:

Fix up Giftless to accept such scopes and only check the sha256 (most likely quick)
Fix up ckanext-blob-storage and all relevant JS code handling downloads (if any) to include the sha256 in the scope auth request and ensure this input format is accepted and scope is granted
This should work around this problem. It means that:
- We continue to rely on lfs_prefix to send the batch request but not to get the auth token when downloading
- Uploads will continue to work as now

datopian / ckanext-blob-storage