airbnb / knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Apache License 2.0
5.48k stars 689 forks source link

Support for external user authentication #231

Closed tkinz27 closed 7 years ago

tkinz27 commented 7 years ago

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj

I want to integrate the knowledge repo with our internal identity services (ldap and saml) such that we can control access to certain posts (although we would like all authenticated users to be able to discover the posts and read the TLDR field which I think is how the private field works today).

I believe the expected configuration is that a reverse proxy sets some authentication headers based on the configuration file. However, just grepping through the code it does not look like the AUTH_GROUP_REQUEST_HEADER is used.

I am also worried a little bit about keeping the users/groups in the knowledge_repo database in sync with our ldap system. Specifically I'm worried that if a user is booted from a group that there is not a good mechanism for updating the knowledge_repo server's view of the groups.

I'm definitely willing to contribute some changes to work this out if you guys are interested. I just wanted to discuss some approaches before I go making a lot of changes. Personally I would like to see something like how JupyterHub works with auth plugins that allow each deployment of the knowledge repo an extension point to integrate with whatever AAA services are provided. I started looking at existing Flask extensions like Flask-security, Flask-principal, and Flask-login, but integrating those seemed like a massive change to the architecture. I'm also not entirely convinced that the managing auth through a proxy is not sufficient.

One of the goals for the Knowledge Repo for us is to get some discoverability to our data science teams reports. Really like that the TLDR is still viewable even though the post is private. This kinda sets a gate up so other teams can see an analysis was done, but have to ask for approval to see it.

matthewwardrop commented 7 years ago

Hi @tkinz27!

Thanks for getting in touch, and for your interest in using the knowledge repo!

You're quite right... we don't currently use AUTH_GROUP_REQUEST_HEADER anywhere, and our current group mechanism is decoupled from LDAP in order to provide a more flexible approach to defining access roles. This allows you to define less corporate groups such as "people who have opted to join a particular internal learning course", or "people who are part of some cross-team endeavour"; and do so in real-time through the web app.

I don't have a lot of experience with the way JupyterHub manages authentication, but I'm definitely willing to work with you to scope out what that would look like for the knowledge repo. At some point in the future, I did think it would be a good idea to manage multiple authentication frameworks, it's just so far there have been more pressing things to work on.

I'm happy to schedule some Skype/Hangouts meeting to discuss your requirements and how best we might work together integrate them :).

tkinz27 commented 7 years ago

Hey @matthewwardrop

First thanks for the quick response. You guys are really responsive.

A Skype/Hangouts meeting would be great. I'm going to be out most of this week, but available next week if that works for you guys.

The jupyterhub model is essentially just a base class interface. You then write custom "Authenticators" which are just pip packages that inherit from the auth base class. In configuration you specify which authenticator you're using and can specify some configuration that gets passed to the derived authenticator.

This allows you to define less corporate groups such as "people who have opted to join a particular internal learning course", or "people who are part of some cross-team endeavour"; and do so in real-time through the web app.

That is a use case we had not considered but I definitely would not want to lose that capability. Could definitely see that as useful for us too. Although for us we would probably need to lock down who is able to create those "meta" groups.

Look forward to working with you guys.

matthewwardrop commented 7 years ago

Let's schedule a time for next week :). Send me an email at matthew.wardrop@airbnb.com!

redyaffle commented 7 years ago

@tkinz27 @matthewwardrop Can you share your outcomes? We're looking to configure access with SAML and I'm not sure how to pass the right information through to Knowledge Repo.

Thank you!

matthewwardrop commented 7 years ago

Hi @redyaffle!

Authentication beyond setting the knowledge repo server up behind a proxy (such as nginx) is not yet supported. If you do want to set it up behind a proxy, you simply need to populate the header pointed to by the AUTH_USERNAME_REQUEST_HEADER server configuration variable.

Proper integration for authentication directly into the knowledge repository is planned, and at this stage (unless the work is done by others first) I'm planning to do the work by the end of June (my wife and kids are going on holiday in mid-June, so I'll have considerably more discretionary time). Most likely, the authentication integration will be done atop of flask-login, and will be modular so that server instances can write their own authentication backends.

Will ping this thread once the work is done :).

jtv8 commented 7 years ago

Hi @matthewwardrop! I have a related requirement to this one, which is that we intend to store the knowledge repo as a private repo on a cloud git service (most likely Bitbucket) and serve the web app from a cloud container service (likely Heroku or Azure). I'm intending to add the ability to use Bitbucket's OAuth API to authenticate users and check that they have read access to the underlying repo before serving it up.

Are you still intending to work on the auth functionality in June? If so, it would be great to combine our efforts. If not, I'm happy to implement something in my fork.

matthewwardrop commented 7 years ago

Hi @zerogjoe ,

I'm planning to add an abstract authentication API in the first half of this month (July), you can then use this to authenticate against any service you see fit :).

matthewwardrop commented 7 years ago

@sunilkpai (Moving from the #299 to general issue tracker). I have mostly implemented this feature locally, and am sorting out some extra considerations (like keeping track of authors and users separately), but should look to land this by next week.

matthewwardrop commented 7 years ago

Fixed in #305 . A specific KnowledgeAuthProvider has not been written for LDAP, but we welcome contributions. I don't have a way to test LDAP at this stage. I'll open a new issue for anyone wanting to contribute an adapter :).

gregorykeller commented 6 years ago

@matthewwardrop - I run product for a company called www.jumpcloud.com. We are a Directory (e.g. an IdP) which supports LDAP among other protocols. I'm on the thread here as my Success team has a backlog of customers/prospects requesting either SAML or LDAP auth into AirBNB from us (both protocols we support and act as the IdP). I am curious if LDAP is even an option for AirBNB as I can not surface any docs on the subject (searching led me to this thread). We'd be willing to set you up with our service to test either LDAP or SAML) against AirBNB if it will help our common customers. Any ack here on what auth schemes you support?

kidtronnix commented 4 years ago

:+1: for LDAP support. We use jumpcloud too! Configuring for Airbnb's airflow was a dream. Maybe could use code from there to support this feature.

j-hartshorn commented 4 years ago

@kidtronnix were you successful in setting up knowledge repo with jumpcloud? Could you provide some advice?