allenai / pawls

Software that makes labeling PDFs easy.
https://pawls.apps.allenai.org
Apache License 2.0
380 stars 74 forks source link

How to run with authentication / authorisation in production? #184

Closed jordanparker6 closed 2 years ago

jordanparker6 commented 2 years ago

It would be great if the docs were a little clearer on running the docker-compose in production.

I have changed the build ARGS and added environment variables. However, this doesn't seem to work.

Any idea on how to run with your own skiff login for OAuth?

version: '3'
services:
    sonar:
        build: ./sonar
        depends_on:
            - api
            - ui
            - proxy
    api:
        build:
          context: ./api
        volumes:
            - ./api:/usr/local/src/skiff/app/api
            - ./skiff_files/apps/pawls:/skiff_files/apps/pawls
            - ./api/config/allowed_users_local_development.txt:/users/allowed.txt
        environment:
            # This ensures that errors are printed as they occur, which
            # makes debugging easier.
            - PYTHONUNBUFFERED=1
            - LOG_LEVEL=DEBUG
            - IN_PRODUCTION=prod
        command: ["main:app", "--host", "0.0.0.0", "--reload"]
    ui:
        build:
          context: ./ui
          args:
            NODE_ENV: production
            BABEL_ENV: production
          dockerfile: Dockerfile-local
        # We can't mount the entire UI directory, since JavaScript dependencies
        # (`node_modules`) live at that location.
        volumes:
            - ./ui/src:/usr/local/src/skiff/app/ui/src
            - ./ui/public:/usr/local/src/skiff/app/ui/public
            - ./ui/package.json:/usr/local/src/skiff/app/ui/package.json
            - ./ui/tsconfig.json:/usr/local/src/skiff/app/ui/tsconfig.json
            - ./ui/yarn.lock:/usr/local/src/skiff/app/ui/yarn.lock
    proxy:
        build: 
            context: ./proxy
            args:
                CONF_FILE: prod.conf
        ports:
            - 8080:80
        depends_on:
            - api
            - ui

    # This service is optional!
    # It is not used during deployment, but simply runs a grobid service
    # which the CLI can use for pre-processing PDFs, using grobid to provide
    # the detailed token information.
    grobid:
        image: 'allenai/grobid:0.5.6-pdf-structure'
        ports:
            - '8070:8070'
            - '8071:8071'
codeviking commented 2 years ago

We don't use docker compose to run PAWLS in production. We deploy the application to a centrally managed GKE cluster. That cluster provides authentication via oauth2-proxy in coordination with the Ingress NGINX Controller for Kubernetes.

In this case your application likely isn't working as toggling the environment variables only causes the API to expect authentication. The API layer doesn't handle handing unauthenticated clients off to an oauth provider -- oauth2-proxy normally takes care of that.

The setup we use is fairly complicated and likely not very appropriate for your use. Here's an idea for an alternative, that might be easier to setup:

  1. First modify the code so that the user provides their email, rather than using the value provided by HTTP headers (which is usually provided by oauth2-proxy).
  2. Deploy PAWLS to a VM using docker compose.
  3. Deploy NGINX to that host as a reverse proxy (forwarding traffic to the PAWLS application). Use HTTP Basic Authentication to identify end users.
  4. Issue a certificate manually using LetsEncrypt and configure TLS via NGINX.

The downside here is that after authenticating you're relying on the honor system as a way of preventing users from masquerading as one another. But that might be an acceptable trade-off, if you have a small number of users.

Hopefully this helps!

jordanparker6 commented 2 years ago

Thanks for the clarification!

codeviking commented 2 years ago

Sure thing. I'm going to close this as I added some details and a suggested path for hosting things. Feel free to reopen it if you run into difficulties and I'll do my best to help!