edina / nbexchange

External exchange for nbgrader
Other
6 stars 2 forks source link

Konfiguration Problems/Questions #148

Open cablesky opened 9 months ago

cablesky commented 9 months ago

I have started nbexchange in a Docker container. In the same Docker network, a JupyterHub 4 is running in Docker Swarm mode.

In the Dockerfile for Jupyter Lab, among other things, it states:

RUN pip install https://github.com/edina/nbexchange/archive/v1.3.0.tar.gz

RUN jupyter nbextension install --sys-prefix --py nbgrader
RUN jupyter nbextension enable --sys-prefix validate_assignment/main --section=tree
RUN jupyter serverextension enable --sys-prefix nbgrader.server_extensions.validate_assignment
RUN jupyter nbextension enable --sys-prefix assignment_list/main --section=tree
RUN jupyter serverextension enable --sys-prefix nbgrader.server_extensions.assignment_list

#------nbgrader teacher
RUN jupyter labextension enable --level=user nbgrader/formgrader
RUN jupyter labextension enable --level=user nbgrader/assignment-list
RUN jupyter labextension enable --level=user nbgrader/course-list
RUN jupyter labextension enable --level=user nbgrader/create-assignment
RUN jupyter labextension enable --level=user nbgrader/validate-assignment

Is it installed correctly?

In which directory is nbexchange_config.py stored?

We use LDAPAuthenticator - how should this be implemented in nbexchange_config.py?

Is there a command to check the configuration?

ykazakov commented 3 days ago

I can try to answer some question as I just managed to make nbexchange 1.4 work with JupyterHub 5 and some steps were not very trivial.

Is it installed correctly?

There are two things that should be running (in separate containers):

  1. The nbexchange service as specified in Dockerfile
  2. The nbexchange plugins for nbgrader that communicate with the server.

The server does not require jupyterlab or jupyterhub to run. The commands for enabling extensions should be used for configuring student and instructor notebooks. Basically, swapping the built-in directory-based exchange to nbexchange using nbgrader_config.py.

In which directory is nbexchange_config.py stored?

It should be installed in the working directory of supervisored, as set by the key directory in supervisord.conf.

We use LDAPAuthenticator - how should this be implemented in nbexchange_config.py?

This part is the most complicated.

First, for simplicity I run nbexchange as a JupyterHub-managed service. That is, it is installed in the container that runs JupyterHub.

pip install https://github.com/edina/nbexchange/archive/v1.4.0.tar.gz

In jupyterhub_config.py add:

c.JupyterHub.services = [
    {  # nbexchange service
        "name": "nbexchange",
        "url": f"http://127.0.0.1:9000",
        "command": ["supervisord", "-n", "-c", "/usr/src/app"],
        "display": False,
        "environment": {
            "NBEX_BASE_STORE": os.environ["NBEX_BASE_STORE"],
            "NBEX_DB_URL": os.environ["NBEX_DB_URL"],
            "COURSE_ID": os.environ["COURSE_ID"],
        },
    },
]

Note that the relevant environment variables should be also passed to this service.

This service registers a proxy accessible at the endpoint /services/nbgrader on JupyterHub, to which all nbexchange requests should now be sent from the user notebooks. These requests will be forwarded by JupyterHub to the url specified in the service.

To get identify of the users, some authentication information should be provided with requests as well. In nbexchange authentication is currently implemented only using a cookie NAAS_JWT:

https://github.com/edina/nbexchange/blob/985796a553a56f4ee39f74f46ab8577a9c263728/nbexchange/plugin/exchange.py#L49

However JupyterHub services are authenticated using tokens that are sent in request headers. The token (with the required subset of permissions) can be either created by the user from the Token menu of JupyterHub, or one can use the token of the user server already stored in the environment variable JUPYTERHUB_API_TOKEN. This token usually has fewer permissions than the user token.

To send the token instead of the cookie with every user request, I had to override the function api_request of the Exchange class (in nbgrader_config.py as this part is only relevant to clients):

def api_request(self, path, method="GET", *args, **kwargs):
        cookies = dict()
        headers = dict()

        headers["Authorization"] = "Bearer " + os.environ["JUPYTERHUB_API_TOKEN"]

       # ...       
       # The rest is left unchanged
       # ...

Exchange.api_request = api_request

On the service side, I now provide nbexchange_conf.py that implements BaseUserHandler to retrieve the required user information from the sent token using the HubAuth class of Jupyter Hub:

class JupyterHubUserHandler(BaseUserHandler):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.hub_auth = HubAuth()
        self.course_id = os.environ["COURSE_ID"]

    def get_current_user(self, request):

        user_model = self.hub_auth.get_user(request)
        name = user_model.get("name")

        return {
          "name": name,
          "full_name": name,
          "course_id": self.course_id,
          "course_title": "cool course",
          "course_role": "Student",
          "org_id": 1,
          "cust_id": 2,
        }

However, the base permissions of the token JUPYTERHUB_API_TOKEN give access only to the basic user information. You can check it using Jupyter Hub Rest API by running from the user notebook:

curl -H "Authorization: token $JUPYTERHUB_API_TOKEN" $JUPYTERHUB_API_URL/user

To provide more meaningful values for "full_name", "course_role" and "cust_id", one has to:

  1. Obtain these values from the Authenticator, such as LDAPAuthenticator mentioned. During Authentication, these values are stored in auth_state.
  2. Make sure that auth_state is saved in the user model.
  3. Add permissions for JUPYTERHUB_API_TOKEN to access auth_state.
perllaghu commented 3 days ago

This is great, many thanks for the details.

Is there merit creating a file describing different installation environments? [we don't use jupyterhub here, so do not have the wisdom you do....]

ykazakov commented 3 days ago

@perllaghu Many thanks for the nice plugin! Once it is up and running, it works surprisingly well!

It could be nice to move this information to dedicated docs. I can contribute to the part on JupyterHub. It should also be not that difficult to publish the docs in readthedocs, which would be even better for searching information.

As you could see, there could be a few things to improve regarding integration with JupyterHub. E.g., to allow token-based authentication. I will create separate tickets to discuss them.

ykazakov commented 2 days ago

Before I forget, here are few additional comments for the JupyterHub setup:

  1. If nbexchange service runs in a privilaged container (--privileged or user: root), one has to make sure that /dev/stdout and /dev/stderr are writable by the user of the supervisord command (see stdout_logfile and stderr_logfile in supervisord.conf). Otherwise one receives some obscure error:

    INFO spawnerr: unknown error making dispatchers for 'nbexchange': EACCES

    If using Jupyter Docker Stacks as base image, it is sufficient to add the option CHOWN_EXTRA=/dev/stdout,/dev/stderr.

  2. Even after granting the user server permissions to access auth_state as described in stps 1-3 above, the user_model returned by HubAuth does not include auth_state. Looking at the sources of HubAuth, it becomes evident that this information is retrieved from the /user endpoint. In the linked discussion above, it is written that auth_state should instead be retrieved from the /users/{user} endpoint. To verify this, run the following command in the user notebook (after performing steps 1-3 above):

curl -H "Authorization: token $JUPYTERHUB_API_TOKEN" $JUPYTERHUB_API_URL/users/$JUPYTERHUB_USER

Given this observation, one has to make the following adjustments in nbexchange_conf.py:

  1. Add a function for retrieving the user model from the /users/{user} endpoint using JupyterHub REST API:
def get_user(user, token):
    import requests
    r = requests.get(
        os.environ["JUPYTERHUB_API_URL"] + "/users/" + user,
        headers={
            "Authorization": f"token {token}",
        }
    )
    r.raise_for_status()
    return r.json()
  1. Use this function to retrieve auth_state inside the get_current_user function:
def get_current_user(self, request):

    # identify the user
    user_model = self.hub_auth.get_user(request)
    name = user_model.get("name")

    # retrieve the user auth_state
    token = self.hub_auth.get_token(request)
    user_model = get_user(name, token)
    auth_state = user_model.get("auth_state")

    # extract the required information from auth_state
    full_name = name
    course_role = "Student"
    cust_id = 0
    if auth_state:
        # use keys specific to the selected Authenticator class
        full_name = auth_state.get("full_name", full_name)
        cust_id = auth_state.get("user_id", cust_id)
        course_role = ...

    return {
        "name": name,
        "full_name": full_name,
        "course_id": self.course_id,
        "course_title": self.course_id, # TODO
        "course_role": course_role,
        "org_id": 1, # TODO
        "cust_id": cust_id,
    }