tailhook commented 4 years ago

This adds to RFC 1001 at #4

Overview

Features supported by HTTP:

Authorization header
Cookie
Basic and digest auth by browser (unusable)

Features suported by browser-based WebSockets:

Cookie
Authentication protocol packets
Basic and digest auth by browser (unusable)

Authentication schemes:

OAuth2 -- the most popular one
OAuth1 -- deprecated by OAuth2
SAML -- we probably want in commercial version
LDAP -- doesn't map to the web by itself. Often used to validate username/password (not something we want to do) or to assign permissions by group (currently we're going to implement ACL in edgedb itself)
Kerberos -- usually relies on system libraries providing authentication and not widely used outside of large enterprises and academia

Related protocols:

SCIM -- identity management. Basically a way to create and manage accounts with unified (REST) API. Could potentially replace our CREATE ROLE/ALTER ROLE statements, but out of scope of this research.
WS-Federation -- does look like ecosystem of its own, with lots of standards including authorization

Commercial providers:

Auth0 -- basically provide JWT+OpenID-Connect (OIDC) identity after authentication
Authentiq -- is also a similar OIDC provider
Atlassian Crowd -- looks like uses cookie for the actual authorization
Okta SSO -- supports OIDC, SAML, and whatever they call "Secure Web Authentication"

Related tools: `. JAAS, Pac4J, Apache Shiro -- java scpecific, not researched closely (but look like just Java interfaces for all other protocols)

Requirements

Same or similar authentication for both HTTP and WebSockets
Scheme should work both in browser and using custom clients
Don't accept login/password or anything directly derived from it, so client doesn't need to keep password in memory for reconnects. And also to avoid handling 2FA. Use external application to verify passwords and multi-factor authentication and only authorize connection in edgedb.

Proposal

Generally authentication should work by providing a Bearer token which is either:

An opaque token, in this case such token should be inserted into the edgedb database by the application beforehand
A Self-Encoded access token, that implements OpenID Connect (OIDC) specification

The downside of (2) is that it's harder to revoke already created token, while the downside of (1) is that edgedb needs to keep track of all the tokens that are active now. Upside of (1) is that it's possible to integrate with more systems (in particular ones doesn't support OIDC, or that support OIDC in the way that is incompatible to edgedb).

The token can be transmitted in the one of three ways (all can be used interchangeably):

Authorization: Bearer <token> -- works for HTTP as well as non-browser websockets
Cookie: <cookie_name>=<token> -- works everywhere, but can be problematic to set a cookie for a domain that is devoted solely to edgedb (we may add a mechanism for that later)
As a param in ClientHandshake, this works on WebSockets only and is needed for browser-based websockets where using Cookie is not apropriate.

We could use AuthenticationSASL with appropriate mechanism to provide token, but we don't need extra security here (i.e. passing token in the ClientHandshake is at least as good as passing it in the Authorization header, which is an accepted security practice). Keeping less round-trips for authentication is useful.

RFC6750 allows passing access_token as form-encoded body parameter and as URI query parameter. We don't allow that now, but we may consider adding them in future if compelling use cases arise.

Configuration:

Configure cookie_name in the "port" configuration
Any things needed to configure to make ACLs work (to be determined when ACLs implemented)

It's unclear whether we want to allow configuring JWT parameters in particular encryption schema. Also I expect secret keys to be generated and replicated within the edgedb itself, but we can have a mechanism to provide users' keys.

Structure of the Self-Encoded Token

TO DO: research OpenID Connect

Future Extensions

In the future, we should consider at least following ways of authentication:

SAML
TLS Client ceritificates
Kerberos

All of them might only be supported in commercial version.

Update: Note on RFC6750 of access_token usage

tailhook commented 4 years ago

I'm going to postpone self-encoded token support. The reasons are below. But first let's take a look how opaque tokens work.

Opaque Tokens

To authorize token, you insert it into a database with appropriate properties. Something along the lines of:

WITH MODULE auth
  MyToken := INSERT Token {
    token_id := make_token_id(),
    expires := datetime_current() + to_duration(hours := 24),
    database := 'my_database',
    role := 'my_role',
    # any other needed settings
  }
SELECT MyToken { token_id }

Then, you can use the token_id as Bearer token or any equivalent method described above.

Self-Encoded Tokens

The main issue with self-encoding tokens is that currently they are structures like this:

JWT provides a layer to encrypt, sign and verify arbitrary key-value pairs (named Claims)
OpenID Connect provides a set of claims that allow to discover user identity and some other authentication parameters
Additionally OpenID Connect has a way to discover various links to other metadata of user profile. Relying party (edgedb in our case) is then expected to fetch various chunks of additional metadata from external resources.

So generally even at the layer (3) there is not much data relevant for edgedb is fetched. And (2) only provides user name (which is generally an external user name, not edgedb's one when using OAuth).

Postponing Self-Encoded Tokens

So the reasons to only support opaque tokens for now is:

There are too much options on how to do self-encoded tokens, and most current standards are mostly irrelevant
Because of (1) we can't guarantee interoperability with existing systems on the level of using their JWT tokens intact
Auth based on self-encoded tokens have to have much more compatibility guarantees than one based on opaque tokens.
Token checks are not in the hot path for WebSockets. While they are in HTTP implementation various caching approaches can be implemented to alleviate the performance issue.

So the current proposal is to implement opaque tokens only, and postpone self-encoding tokens to the time when both will be true:

ACLs are implemented
We have more experience with how token-based authentication is general

elprans commented 4 years ago

Great summary, thanks @tailhook!

The issue with a non-self-encoded token, as you pointed out, is that we will have to store token metadata somewhere. Storing it in a database begs a question: which database? We currently avoid having a "special" database to store global metadata and instead rely on metadata in Postgres shared catalogs (pg_database and pg_roles). This arrangement makes maintaining large quantities of user-associated metadata, such as a list of valid tokens, quite cumbersome, especially where expiring tokens are considered.

I think we should take a closer look at using JWT as the token protocol from the get-go using EdgeDB-specific claims (probably just the name of the database role for now). This would make it easier to add support for OIDC later as well.

tailhook commented 4 years ago

Okay, if we don't care about compatibility with anything, we can use JWT for encoding our own things, but...

To revoke a token we have to store some token metadata. This is generally a lot less actual storage, but structurally it's the same. I don't believe we can get to production without any way of revoking tokens.

We currently avoid having a "special" database to store global metadata and instead rely on metadata in Postgres

Do you think this will continue to be true when we have ACLs?

elprans commented 4 years ago

To revoke a token we have to store some token metadata.

You only need to keep a set of revoked token ids. Revocation is also a relatively rare event, so the set will not be large, which makes shared catalog storage feasible.

Do you think this will continue to be true when we have ACLs?

Yes. The authorization scopes will be encoded as claims in JWT.

tailhook commented 4 years ago

Yes. The authorization scopes will be encoded as claims in JWT.

I'm not asking about scopes. I'm about the actual access control lists, rules, whatever. I expect quite a bit of metadata about relations between users and data. I expect them to be stored somewhere.

elprans commented 4 years ago

Oh. The actual access rules will be defined in the schema with DDL/SDL: https://edgedb.com/roadmap/#access_control. The scopes in the token will effectively populate globals in a session, which, in turn, will trigger relevant access rules.

tailhook commented 4 years ago

Well, so ACLs will depend on the database. We can do the same with tokens, since the current spec declares a database in the URL wss://host.name/ws/database_name, we can look for the tokens in the database itself, rather than using a "special" database.

tailhook commented 3 years ago

To Do: take a look at PASETO: https://developer.okta.com/blog/2019/10/17/a-thorough-introduction-to-paseto / https://devops.com/okta-offers-paseto-as-alternative-to-json-tokens/

1st1 commented 3 years ago

To Do: take a look at PASETO:

That's a good one, thanks for sharing

edgedb / rfcs

Authentication schemes for HTTP/Websockets #5

Overview

Requirements

Proposal

Structure of the Self-Encoded Token

Future Extensions

Opaque Tokens

Self-Encoded Tokens

Postponing Self-Encoded Tokens