bloom-housing / bloom

Bloom is Exygy’s affordable housing platform. Bloom's goal is to be a single entry point for affordable housing seekers and a hub for application and listing management for developers.
https://bloomhousing.com
Apache License 2.0
33 stars 26 forks source link

Build out User/Authentication Service v0 #133

Closed jacchau closed 4 years ago

jacchau commented 4 years ago

Infrastructure:

User Endpoints:

Admin Endpoints:

bencpeters commented 4 years ago

I've started work on this, even though we don't have the database/ORM side implemented yet; I can pretty easily just stub that out/use an in-memory store to get a POC here.

I spent some time doing a survey of current best practices in this area. I laid out several different options with justifications below. My opinion is that Option 2 is probably the best fit for where we are right now, given that it involves the minimum amount of additional service orchestration while still giving us the benefits of a fully certified OIDC implementation. But, I think this probably belongs as a group discussion, so read through the options, and let me know what your thoughts are!

Option 1

The simplest thing we could do basically looks like this (tl;dr I think this is reinventing the wheel)

image

In this scenario, each client (e.g. one of our apps) talks directly to the auth server, authenticates using some trusted credentials (e.g. email + password). The auth server validates the credentials, then issues a short-lived (10 min?) access token & refresh token, both signed with a secure private key. The refresh token gets saved into the DB to make it rotatable/revocable. The client can save the refresh token manually (e.g. local storage) or "automatically" (HttpOnly cookie to partially mitigate XSS). The short lived access token can than be used to authenticate API services. Each API service then verifies the signature of the access token against the pre shared public key, and verifies the short expiration date. If this verification fails, the client can automatically handle the 401 and use its saved refresh token to get a new access token from the auth server without user interaction; however API servers don't need to directly talk to the auth server or database (unless we want the ability to immediately revoke an access token instead of relying on the short expiry time).

The engineering lift in this is basically:

This wouldn't be too bad to build, but as you can see, there are a number of details to get right. More importantly, it involves a lot of potential security pitfalls that are already quite easily solved by the well-known Open ID Connect (OIDC) standard. A general axiom in web security these days is to avoid doing too much of your own custom engineering unless you're an expert. ("Don't roll your own crypto")

Option 2

We could create an OIDC provider app in Node. oidc-provider is an OpenID certified implementation of the standard available as a Node library. It takes care of all of the details around generating, tracking, and revoking access/refresh/id tokens. It also does so in a spec-compliant way, which means that we could quite easily add the ability for third party vendors to plug into our APIs in a secure, standards compliant way (OAuth). This could also be useful for having different first party client applications with different levels of permissions.

image

The main difference between this option and Option 1 is that this more or less requires a login flow with a redirect. Technically the OIDC spec does support a direct "username + password -> access token" flow like in the previous diagram (called Resource Owner Password Credential, or ROPC, flow), but it is strongly discouraged by basically everyone in this space at this point, and was only included in 2014 in the OIDC standards for legacy/backward compatibility. In practice, assuming we don't want to go against the conventional wisdom on this, it means that we'd need to implement so-called Identity Provider services (user login, registration, logout) as pages that live on the authentication server. Then, for initial login (or registration), the client app would redirect to these pages (either in a pop-up window or in the main browser window - either would be fine), complete user authentication with password or other, then be redirected back to the client app with an access token & refresh token. Future access token grants could happen without user intervention. We could again have short-lived access tokens that API services independently validate, but the node authentication server would also support validating an access token out of the box with the package (so an API could delegate responsibility for verifying the access token to the auth server, albeit at the cost of another API call/dependency). The auth server would also publish public keys by default, so that solves some of the problems of easily pre-sharing public keys.

The oidc-provider library comes out of the box with a fairly complete implementation, and the fact that it is standards compliant means that we could leverage existing OIDC client libraries in any public clients (web apps or native apps, for example), rather than having to write our own logic for refreshing tokens, etc. However, we would have to set up a backend (the reference implementation uses Redis) for storing access & refresh token grants for persistence and clustered use. We also would need to set up the login pages/logic (which are hosted on the auth server). However, I think the engineering involved here is only slightly different from Option 1, and it has the significant benefits of being battle-tested & standards compliant, already set up to allow third party data access to our system down the line.

Option 3

image

This is similar to option 2, but instead of integrating the OIDC server & identity providers into the same server, we use a standalone server like ORY Hydra to perform the OIDC flows (e.g. everything token related), and stand up our own server just dedicated to the Identity Provider functionality (initially user + password auth). This has the advantage over option 2 of being a better documented and seemingly more popular (8k vs. 1k GH stars) project than oidc-provider with less custom code, although in both cases it's mostly just a bit of configuration. The disadvantage compared to Option 2 is that this involves 2 servers for authentication - the login/identity provider server (which renders the actual login/registration views) and the OIDC server (which takes care of tokens). ORY Hydra is designed for containerization, and generally seems easy to deploy in the context of a container orchestrated infrastructure, but at the point that the Bloom project is at right now (e.g. not having an existing orchestration service) it might be a little bit more work to set up & monitor than the single server option.

Other options

Use a service such as Auth0 (Okta and Firebase Auth would be other possibilities) to provide at least the ID Provider layer (we could then combine this with an OIDC server-in-a-box like Hydra, or a Node instance, or leverage their solutions for this). Depending on how widely used the platform became, this could obviously end up being somewhat expensive, as well as creating vendor lock-in, and generally making the platform a bit less OSS friendly. But it could be a quick & secure path to the desired functionality, so it's worth bringing up.

bk3c commented 4 years ago

Thanks for the super-thorough writeup, @bencpeters. I agree with you that Option 2 seems like the best path forward. I'm pretty unfamiliar with OIDC, but given that it seems to be a more standardized version of what I had previously been thinking, it seems like a very promising approach.

Unless you think there are other gotchas, I'd be in favor of going ahead and prototyping something out with oidc-provider, perhaps in combination with using Google as the IdP for @exygy.com accounts, which should give us solid security for admin stuff and let us punt on all of those backend pieces until the next step.

bk3c commented 4 years ago

The one part I'm not sure about in your Option 2 diagram is having the Auth UI (e.g. username and login box) be rendered by the auth service, rather than the apps themselves. My vision was that each app would have control over the UI presentation, and then just send the results to the auth service for validation and token issuance.

I guess we could do something different if that's not standards compliant or otherwise a bad idea, but my expectation is that the apps will likely have different login UIs, so by default I think it would be best to keep those UIs with their respective apps.

jaredcwhite commented 4 years ago

Yes, thanks @bencpeters for the writeup. Food for thought!

Maybe I'm being a total dunce here, but it's unclear why starting with a simple endpoint that returns a JWT with, say, a 12-hour expiration is a bad thing? At least for this initial go-round, logins will be used by end-users of the listing applications service and from a UX perspective there possibly won't even be passwords involved—they's just enter their email address and get a magic link, so they can review their application submissions. I'm not sure what the use case is for any additional complexity.

bencpeters commented 4 years ago

The one part I'm not sure about in your Option 2 diagram is having the Auth UI (e.g. username and login box) be rendered by the auth service, rather than the apps themselves. My vision was that each app would have control over the UI presentation, and then just send the results to the auth service for validation and token issuance.

I guess we could do something different if that's not standards compliant or otherwise a bad idea, but my expectation is that the apps will likely have different login UIs, so by default I think it would be best to keep those UIs with their respective apps.

This is exactly the core of the question. I agree that this was (is?) against my intuitive view of how the app would work, but it really does seem like the general security community is moving away from this type of flow.

Technically the Resource Owner Password Credential (ROPC) flow in the OAuth spec supports this type of login, but a wide variety of sources including the chair of the OIDC standards committee have quite unequivocally said to never use it, and most consider it deprecated. The node oidc-provider library actually doesn't even come with a ROPC implementation as a supported flow, although it's extensible enough that we could write our own if desired.

The reasoning is:

Now, granted, most of these complaints apply more to larger organizations that have a wide mix of 3rd party and 1st party apps in their ecosystems, or more internal systems to interface with. But, the goal for Bloom is to eventually be a larger system, and honestly with the number of different types of users involved (different municipalities, end users, and developers), and range of shared data potential, having good identity federation practices could actually be an important architectural point in this system, so I think doing it "right" from the start is probably good.

Additionally, if we decide to federate identity with Google (as you mentioned above for the MVP ^), then this renders the whole point moot - as we'd use them as an IDP anyway, rather than our own username/pw.

Basically, we could enable an implementation with this username/pw flow, but it's against best practices, and the more I think about it, the more I think we can make a good UX without it. It also sets us up well for future developments like mobile apps, where this is considered even more important of a best practice.

bencpeters commented 4 years ago

Maybe I'm being a total dunce here, but it's unclear why starting with a simple endpoint that returns a JWT with, say, a 12-hour expiration is a bad thing? At least for this initial go-round, logins will be used by end-users of the listing applications service and from a UX perspective there possibly won't even be passwords involved—they's just enter their email address and get a magic link, so they can review their application submissions. I'm not sure what the use case is for any additional complexity.

No, it's a valid question. We certainly could develop our own system. The issue is that at that point we're basically rolling our own security, and have to pay a lot more attention to the details to ensure it's actually secure. As with most things in the cyber-security realm, the details are often complicated, and make a big difference for how secure a system actually is. To me it makes more sense to go with certified implementations of this stuff that have been tested in the wild, gone through with a fairly fine toothed comb, and are based on best practices that top security experts have agreed on (in the form of standards). Incidentally, it also comes with the benefit of enabling much more easy integration of our services with other providers if we have a standards-compliant implementation. In this particular case, I'm also not even sure that we actually save much on the simplicity front - while the library is certainly far more complex than what we'd build for a simple token server, it's already built, and we just have to configure the library appropriately, versus building our own system to sign tokens, manage invalidation, refreshes, sharing public keys, claims, etc. So while much of the OIDC spec is beyond the scope of what we're doing right now, it might be useful at some point in the future, and it doesn't really seem much harder to implement the limited functionality we want now in a way that's likely to be more secure and also easier to extend with more functionality down the line if needed. I'd feel a bit differently about the trade-off if I thought it added a lot of engineering complexity to implement an OIDC provider server vs. the "simple" token service, but I honestly don't believe that's the case (provided we use a certified OSS implementation - if we were rolling our own OIDC implementation it'd be a totally different trade-off).

Edit to add: The choice of exactly how we want to manage access tokens and persistent sessions is a bit orthogonal to the provider. If we decide the UX we want is session-scoped access tokens (e.g. they last for up to 12 hours, but disappear and require a new login with every session - good for use on a public computer), that's totally do-able with an OIDC server with minimal effort. We can do a 12 hour access token, but also have options for longer-lived sessions in the form of refresh tokens to make for a better UX when we do want to allow the user not to have to re login all the time.

software-project commented 4 years ago

Thanks @bencpeters for thorough approach. Oidc provider looks good. I'm not super familiar with serverside in js, but isn't there something that comes 'out of the box', like a middleware. I know we have many decisions to make, but I found this: https://github.com/jaredhanson/passport We don't have a decision on backend server yet, and it seems like every backend server have it's on thing. Seems like ApolloJs server have authentication build in. So maybe we should make that decision first? Unless of course we want to go for services architecture, then it seems like the way to go.

bencpeters commented 4 years ago

@software-project My understanding, based on conversations with @bk3c, was that we were planning on a services architecture for the backend? If that's not the case, then this discussion definitely needs to be different...

Passport solves a different problem. We could definitely use passport as an OIDC client on different services that wanted to talk directly to the auth server, but it's not really designed to provide the tokens/auth server itself.

The idea here is to provide a centralized auth server that would take care of user auth & token management. In principle each service could then decide how to handle validating the token on their own (although we should probably have a standardized implementation). If we're not going for that architecture, it would be good to decide that before we build this service out...

bk3c commented 4 years ago

I'm marking this in-development, since it sounds like @bencpeters is in progress on an OIDC-based prototype.

I also got a chance to look through the node OIDC provider docs over the weekend a bit, and the one thing that I couldn't understand is how we would handle SSO? There isn't a use case on the public app side, but I definitely think we'd prefer it with Partners where feasible in order to reduce our security surface. If nothing else, we'd want to use SSO from Google for authn (but not authz) for Exygy admin accounts. @bencpeters Is that reasonable with the road we're going down?

bencpeters commented 4 years ago

I also got a chance to look through the node OIDC provider docs over the weekend a bit, and the one thing that I couldn't understand is how we would handle SSO? There isn't a use case on the public app side, but I definitely think we'd prefer it with Partners where feasible in order to reduce our security surface. If nothing else, we'd want to use SSO from Google for authn (but not authz) for Exygy admin accounts. @bencpeters Is that reasonable with the road we're going down?

Absolutely. There are actually a few different ways to do this.

Essentially, the OIDC provider is just managing the issuing of credentials (e.g. an access token) that allow access to the various services on our network. It is fairly agnostic as to to how the actual authorization happens as long as the flow is followed properly, leaving us free to implement a variety of login methods.

We could set up a client auth server for any arbitrary SSO option we wanted that implemented the actual SSO logic, then used the Client Credentials grant type to get tokens from the auth server.

For any SSO option supported by Firebase (Google, Microsoft, Facebook, Twitter, Apple, SMS), it's even easier. We can set up the "interactions" of the auth server to authenticate using Firebase, then issue the tokens as our PCKE flow (this is my plan for the POC, but using Firebase email/pw). This would make adding any of these SSO options as easy as enabling them in the Firebase configuration.