Example logic queries - Githubissues

Geal commented 5 years ago

with @clementd-fretlink, we've been looking at a datalog like language to express caveats. Here are some ideas that emerged:

we want to avoid rights reactivation that can appeared if we implemented negation. On the other hand, a limited model like datalog with constraints is interesting, because it can express limits on ambient data while keeping a simple logic
the authority field in the first block defines some facts
the ambient data defines some facts (IP address, time, http headers, etc)
the verifier provides some facts
the verifier provides some rules that can be used in caveats
caveats are queries that use rules and constraints on facts
we start from the initial set of facts (authority, ambient and verifier provided) at each caveat evaluation, to avoid new facts being kept between applications and possibly messing up validation

Current questions:

could one block define its own rules to make writing caveats a bit easier?
could one block define some facts and rules that could be reused in further tokens? (not sure I want to introduce ordering again here)
should facts have a scope, to prevent some rules generating them? (example: the ambient facts set should be locked, no new facts should appear)

To make it easier to reason about this language, I propose that we write some example facts, rules and queries in that issue.

First example:

authority=[right(file1, read), right(file2, read), right(file1, write)]
----------
caveat1 = resource(X) & operation(read) & right(X, read)  // restrict to read operations
----------
caveat2 = resource(file1)  // restrict to file1 resource

With resource(file1), operation(read) as ambient facts, caveat1 succeeds because resource(file1) & operation(read) & right(file1, read) is true, caveat2 succeeds because the fact resource(file1) succeeds.

With resource(file1), operation(write), caveat1 fails but caveat2 succeeds. With resource(file2), operation(read), caveat1 suceeds but caveat2 fails.

Geal commented 5 years ago

Example of a authentication server having a very broad token that is attenuated before being given to a user. Also showing the idea of having the authority defined by rules:

authority_rules = [
  right(X, read) <- resource(X) & owner(Y, X), // if there is an ambient resource and we own it, we can read it
  right(X, write) <- resource(X)  & owner(Y, X) // if there is an ambient resource and we own it, we can write to it
]
----------
caveat1 = resource(X) & owner(geoffroy, X) // defines a token only usable by geoffroy

With the ambient facts resource(file1), operation(read)and the verifier provided fact owner(geoffroy, file1), caveat1 succeeds. The verifier would also make its own query: resource(X) & operation(Y) & right(X, Y).

Could such a token give more rights accidentally?

Geal commented 5 years ago

Using constraints:

authority=[right(/folder/file1, read), right(/folder/file2, read),
  right(/folder2/file3, read)]
----------
caveat1 = time(T) | T < 2019-02-05T23:00:00Z // expiration date
----------
caveat2 = source_IP(X) | X in [1.2.3.4, 5.6.7.8] // set membership
----------
caveat3 = resource(X) | match(X, /folder/*) // prefix or suffix match

The verifier can present ambient facts like the current time or source IP address. Depending on the type of a value, we could have these kind of operations:

equality (already present in datalog without constraints)
ordering and inequality (<, >, <=, >= !=)
set membership (X in Set, X not in Set)
pattern matching (for simpler constraints, maybe let's limit it to prefix or suffix matching)

Revocation tokens:

authority=[]
----------
caveat1 = Token_1 not in RevocationList
----------
...

The verifier would have revocation lists in which we can check the inclusion of a token

Geal commented 5 years ago

a token that could be used by various microservices (using hosting as an example), trying to deploy an application:

authority = [organisation(myorg), owner(myorg, myapp), owner(myorg, myapp2)]
authority_rules = [
  right(app, X, Y) <- organisation(myorg) & application(X)
    & owner(myorg, X) | Y in [read, write, deploy] // we get read, write and deploy rights on any application owned by "myorg" organisation
]
----------
caveat1 = application(myapp) & operation(Y) & right (app, myapp, Y) // restrict request to application "myapp"

Note: from this example I see that some facts could be marked as unique. If the application fact is not unique, caveat1 will not restrict access to applications other than myapp. Another thing: if we add another application to that organisation, we might want to get access to it directly without minting a new token. So maybe the information owner(myorg, myapp) owner(myorg, myapp2) should come from the ambient data?

In the billing service:

ambient = [ application(myapp), operation(deploy), ramGB(4)]
verifier_facts = [credit(myorg, 100), credit(org2, 0), credit(org3, 20)]
verifier_query = application(X) & operation(R)
  & right(app, X, R) & owner(Y, X) & credit(Y, Z) | Z > 0

The billing service checks that the organisation has the right to deploy this application and that it has enough credits.

On the hypervisor:

ambient = [ application(myapp), operation(deploy), ramGB(4) ]
verifier_facts = [deploy_limit(myorg, 2), deploy_limit(org2, 16), deploy_limit(org3, 1)]
verifier_query = application(X) & operation(R) & right(app, X, R)
  & owner(Y, X) & deploy_limit(Y, Z) & ramGB(M) | Z > M

the verifier query would fail because the RAM limit for an app of this organisation is 2GB, lower than the requested 4GB.

Constraints can express useful properties like those limits on available credits or memory usage, but maybe those should not be verified inside the token. It requires that the verifier loads facts from unrelated organisations, making the whole query a lot slower.

tarcieri commented 5 years ago

Glad to see you exploring authorization languages!

One small note: your examples use application(X), ostensibly with the idea that you'll have a single credential shared by many services.

In my experience that's a classical source of audience confusion attacks. However, audience confusion generally arises in these scenarios due to logic bugs. So perhaps a sufficiently well-designed authorization language can avoid them.

That said, an alternative I prefer is to cryptographically bind the credentials to audiences. With traditional Macaroons this usually involves obtaining a third party caveat from an online service. With a public-key construction, I think you can invert that relationship, and services can present their "SSO credential" which the target service can verify, mint a symmetric credential, and the "SSO credential" can optionally include a third party caveat which requires a discharge from the client for any client-imposed restrictions (this is the "Madeleines" idea I've referred to elsewhere).

Probably a bit of a red herring for authorization languages. Otherwise this looks interesting.

divarvel commented 5 years ago

I like how it looks, because it allows both the token (through the authority part) and the service (through ambient facts) to provide scope. This is a net benefit over the implicit way it was done with macaroons, both in terms of clarity but also in terms of flexibility.

With explicit scope provided during token validation (either in the token authority or by the service), I think it would sidestep all the confusion common with macaroons (since macaroons rely on implicit scoping thanks to the identifier).

I'll work on typing judgments based on these examples, to see what the types ystem could look like.

tarcieri commented 5 years ago

With explicit scope provided during token validation (either in the token authority or by the service), I think it would sidestep all the confusion common with macaroons (since macaroons rely on implicit scoping thanks to the identifier).

We seem to be using "confusion" two different ways. I'm using it in the context of a "confused deputy", or more generally an "audience confusion attack". See the following example:

https://blog.intothesymmetry.com/2017/10/slack-saml-authentication-bypass.html

The Macaroons approach provides a cryptographic audience binding through the use of symmetric cryptography. This avoids confused deputy attacks, because if something is wrong, the credential fails to verify. Debugging this may be... confusing... but it does fail closed, as opposed to systems where if audiences are deployed incorrectly, it provides lateral movement between services and escalation of privilege.

Geal commented 5 years ago

I was wondering about providing scopes for facts to avoid these kinds of issues. So there could be these scopes:

authority scope, defines a set of fact types that the verifier knows about by looking up the root key
ambient scope
verifier scope (for facts provided by the verifier)
evaluation scope: for facts that are generated by evaluating a caveat and its query. This one is emptied between each caveat evaluation This way we could avoid systems that would mistakenly create facts that collide with the authority facts or others, and the verifier would need to know which fact types it must use depending on the root key

divarvel commented 5 years ago

I should have looked it up indeed. Thanks for the link, I'll check it out.

On Thu, Feb 7, 2019 at 3:49 PM Tony Arcieri notifications@github.com wrote:

With explicit scope provided during token validation (either in the token authority or by the service), I think it would sidestep all the confusion common with macaroons (since macaroons rely on implicit scoping thanks to the identifier).

We seem to be using "confusion" two different ways. I'm using it in the context of a "confused deputy https://en.wikipedia.org/wiki/Confused_deputy_problem", or more generally an "audience confusion attack". See the following example:

https://blog.intothesymmetry.com/2017/10/slack-saml-authentication-bypass.html

The Macaroons approach provides a cryptographic "audience" binding through the use of symmetric cryptography. This avoids confused deputy attacks, because if something is wrong, the credential fails to verify. Debugging this may be... confusing... but it does fail closed, as opposed to systems where if audiences are deployed incorrectly, it provides lateral movement between services and escalation of privilege.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CleverCloud/biscuit/issues/11#issuecomment-461450893, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKk8_dTRrD18qhymebNDXXfASZBsdkZks5vLDzrgaJpZM4ajw22 .

tarcieri commented 5 years ago

A more typical example of audience confusion is X.509 hostname verification. Certificates are issued within the scope of a particular SAN/set of SANs which describe the audience of that particular certificate. Unfortunately, these constraints (or other constraints) are often ignored, and certificates are treated as providing authority over all sites on the Internet.

All that said: I think the big problem is e.g. X.509 libraries end up implementing that same sort of logic over and over again in slightly different ways. There is no One True Spec of how to do X.509 hostname verification (I have implemented RFC 6125 and it is littered with "MAYs").

One of the things I like the most about authorization languages is they provide a single place to focus efforts on getting the verification logic correct.

@Geal your "scopes" are definitely in line with what I'm thinking, although I used different words. Nobody can agree on the words!

I was generally using the word "context" for what you describe as "verifier scope" and "ambient scope". They are indeed two separate things, and I tried to recognize the separateness of ambient authority. "Context" has its own overloaded meaning in OCap though (e.g. "user context", which is more like "authority scope" in your definition).

SegWit uses the term "witness data" for what you're describing as "authority scope", i.e. witnesses to an NP assertion (within the context of a proof of knowledge).

tl;dr: naming things is hard, and I have no answers 😉

Geal commented 5 years ago

the one who writes the spec defines the words, however bad they MAY or MUST be 😆

Geal commented 5 years ago

I translated most of those examples to my small datalog implementation: https://github.com/Geal/dataexponential/blob/master/src/biscuit.rs#L96-L370

The types are a bit annoying to manipulate directly, but I'll test making a nice API on top (if I'm evil I might even make a text representation that compile to those types). I added these types of constraints:

integer: smaller, larger, equal, set inclusion, set exclusion
string: equal, prefix, suffix, set inclusion, set exclusion
date: before, after
symbol: set inclusion, set exclusion

The symbol type is an implementation detail I tested: in my first implementation, everything was basically strings (fact name, values, etc), so I added an external symbol table that maps strings to numbers. So the symbol type is just a number that's easy to test for equality and set membership. It makes rules and queries quite fast, and it brings us closer to my original idea of using HPACK to have a compact representation of the caveats (and it allows pretty printing). The token can carry that symbol table with itself, and caveats will refer to it. And we'll be able to prepopulate that table. (I'm still wondering about possible confused deputy issues with symbols being reused here and there, but facts creation and checking works quite well right now).

Geal commented 5 years ago

the revocation id pattern requires a bit more:

the ability to add a fact in a block. This will require proper care, because this fact should not be part of the authority or ambient scope. And they should not be carried over to the next block validation
the ability to add verifier specific queries

See https://github.com/Geal/dataexponential/commit/1ffc2929bba07ce514b6684b8a92ccd7b1d74c81 for the implementation

Geal commented 4 years ago

the logic language has sufficiently evolved now, closing this

biscuit-auth / biscuit

Example logic queries #11