Which layer does 'enforce authorization' belong in?

dmitrizagidulin commented 4 years ago

Currently, item 4.2.4 in Layer 1 mentions enforcing the server's authorization system. But the authorization system itself is higher up in the levels. This is a tracking issue to discuss: do we move 'authorization enforcement' up to that layer? Or lower the item mentioning the pluggable authorization system down to layer 1?

OR13 commented 4 years ago

I see it like this:

Layer A

Data Model for Raw Storage Interface for Raw Storage

Layer B

Data Model for Logical Storage (Vault, Document, Config, Meta Data) Interface for Logical Storage

Layer C

Data Model for Authorization Interface for Authorization

Layer C could be OAuth access tokens, or HTTP Signature ZCaps... but regardless of how the authorization is represented... it still operates on a logical storage layer...

Can <role>.<action>.<resource>...

Can did:example:123 read vault/123? Can did:example:456 write vault/123/document/789?

Notice that authorization is about logical resources, not raw storage... we could go super fine grained and implement a permission model for every word in a block of memory... but thats imo, an internal implementation concern...

The authorization interface that users care about (that is reflected in the spec in abstract form), is logically equivalent to a bank vault.

You register with a bank for a safe deposit box.

The bank authorizes you, walks you into the vault and pulls down your safe deposit box... they then leave you alone with it.

You use your key to open the box, and inside are all your documents...

The bank vault is the edv server... just like bank of america and chase have separate vaults... Transmute and Digital Bazaar have separate EDV servers.

The safe deposit box is the "edv vault", its the thing that holds your documents... unlike the safe deposit example.. the bank knows how many documents you have. Now imagine that each safe deposit box that you have only has 1 one document.

When you grant your child access to the box, the bank will pull it down for them, but they still need a key to open it.

The bank employee is the authorization layer... they operate on the logical layer (safe deposit boxes)... they don't operate on data inside a document... they don't tell you where you can and cannot write on your will or birth certificate... and likewise, i argue that the spec should not apply authorization below the logical storage layer.

agropper commented 4 years ago

The safe deposit box is a good use case to illustrate a couple of points. It will be most useful if there are points in the safe deposit case that were not raised in the health report case.

What are the differences?

The bank does not operate on the contents of the box by design. This does not mean that the stuff in the box is encrypted. It just means they would need a court order to drill the lock. I guess this is what @OR13 is calling the "logical layer" and I would call the Policy Enforcement Point (PEP). Is this really any different than saying the data store operator has an access control agent of it's own with a proprietary interface (the safe door and a drill) to the store? I would claim the safe deposit box adds little to the health report use case in this regard.
Delegation is another point of potential difference. The bank (as PEP) checks the credentials of the requesting party. They check the access token (a key with little or no identifiers) and the biometric driver's license of the requesting party. The requesting party might be the (a) data subject of what's in the box, like a will; (b) the person that purchased the box from the bank to stash a bunch of cash, or (c) a delegate of (a or b) that was previously registered with the bank. These are all equally relevant. To the extent this is clearer than the implications in the health report use case the safe deposit box is important.
There are other cases to be considered.
- What happens (d) the requesting party has not been pre-registered with the bank?
- If the bank as PEP gets a court order to drill the box, do they need to notify the customer that they did so? Do they need to notify the customer that they intend to drill and wait for an injunction?

bumblefudge commented 4 years ago

There was a healthy chat in today's meeting about whether "Authorization" is a general-enough umbrella term to include all the various options and use cases (OCap/ACL, chained-VCs, PDP/PEP distinction, etc). Someone asked what that last one was, and Nikos posted this: PDP/PEP Section 4 here https://tools.ietf.org/html/rfc2753 (Leaving this in an issue so that some remembers to link to this definition)

agropper commented 4 years ago

Thanks for the rfc2753 link @bumblefudge. I think it applies directly to SDS in almost every respect, realizing, of course, that storage resource requests will be different and much more diverse domains than router resource requests.

Section 6 of rfc2753 describes the protocol essentials between two layers labeled PDP and PEP. There are 7 bullets as protocol requirements. Which of these seven would not apply to the SDS specs?

cwebber commented 4 years ago

I probably have a different view than most of the existing group. Here's my suggested perspective on layers:

Encrypted data abstraction layer: defines URI types for both immutable and mutable data that are not tied to a specific location. The algorithms for piecing these together and signing off on updates need not be tied to any specific storage provider (Datashards takes this approach). This is always encrypted... "public" data is merely encrypted data where the URI to retrieve it is publicly available.
A URI type which unions a specific storage location WITH the previous agnostic URI type. Like magnet URIs, these say "ok, but you can specifically get this data here".
An API for retrieving information from the storage location: either a (capability) URI if knowing the api is sufficient, or if designation and authority must be separated, a certificate capability approach.
An API for specifying how a user would like their data replicated: Designed with the foreknowledge that we cannot prohibit delegation, but we can request it. A storage provider can always share underlying encrypted chunks without our foreknowledge, and so too can someone who has a URI which allows retrieving the entire chunk.

OR13 commented 4 years ago

Pending consensus regarding layers, open a PR to remove ambiguity around authorization being layer 1 vs layer 2.

cwebber commented 4 years ago

From a Datashards perspective, we presently have:

Stores, which just store encrypted content-addressed chunks, but are naive about them
Immutable datashards clients, which can upload files to stores and generate a URI representing it, or knowing a URI, be able to retrieve that content.
Registries, which track mutable updates (but do not know necessarily what the updates mean)
Mutable clients, which can create new files, which gives you three kinds of capabilities: write-read-verify, read-verify, and verify... the latter is the only thing that registries ever get access to; they do not need to know what the updates refer to.

Stores and registries are not required to be smart... they can be implemented as merely a directory that is looked in for files. But they can be as advanced as something that specifies advanced access control. But it's important to know that this is completely unnecessary: you can have a filesystem on a USB key which sufficiently serves as a store or registry. But that won't give you the desired amount of replication and etc and what we don't have is the tuple that says "get this immutable/mutable datashards file from this location!"

That's where I anticipate the most room for collaboration.

I strongly encourage, even if not adopting Datashards, taking the approach where the required abstract interface for immutable storage and mutable registries is so minimal that you don't have to even run a web server... the world can burn around you in the apocalypse but if you have a datashards URI, and you find a usb key laying around that has the relevant chunks and update certificates on it, you can figure out still what it was referring to.

Have a mad-max-proof layer, then build the smart stuff on top of it.

agropper commented 4 years ago

@cwebber Could you recast your comment above in the discussion for #60? It is there that we're discussing whether an Access Request #36 is being processed in the same layer as the "enforcement" of security in a secure data store.

I agree that registries and stores are not required to be smart. The smarter they are, the more privacy and surveillance risks are posed to both Alice (the data subject) and Bob, the requesting party.

venu2062 commented 4 years ago

Prevent bypassing authrorization provided at higher layers by directly accessing lower layers #71

OR13 commented 4 years ago

The issue with binding authorization to the storage directly, is that it prevents pluggable authorization... you can't change the authorization scheme if its hard coupled to the storage representation.... when you replicate you will be replicating authorization, which means that you will have 1 authorization scheme... not a pluggable one.... that means no support for ZCaps / TXAuth, OAuth etc.... I think we need some kind of visualization of how to fit IPFS / Datashards / ZCaps / TxAuth onto the same picture, so we can see how we can or cannot support all of them in the spec today.

maybe we learn that we want to have 1 authorization scheme, hard coupled to the storage mechanism... but that scheme is not likely to be things like ZCaps or TxAuth... AFAIK Datashards would be the only solution for that...

@cwebber would it be possible for you to build a picture of EDVs as they exist today (storage and authorization) and then overlay a proposal for leveraging datashards, so we can see what that would look like?

venu2062 commented 4 years ago

@or13 True. Granular Authorization has to be at or above the logical layer as you presented today. The point i was trying to make is that the proposal should prevent using a parallel path to bypass authorization.

This can be done by encrypting data where authorization is done or by authenticating the layer enforcing authorization at the lower layers. The first option provides positive control because the second can be ignored by the lower layers.

At the same time all of this does not prevent someone getting hold of the cypher text - which is the argument made in favor of having authorization in addition to encryption.

I am sure that this will be discussed when we get into the details of layers.

OR13 commented 4 years ago

@dmitrizagidulin to revisit this after authorization discussion... there are a couple proposal which should be documented here.

decentralized-identity / confidential-storage