Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 13 forks source link

Implement authentication token for the API #1187

Open Woseseltops opened 3 months ago

Woseseltops commented 3 months ago

Related to #1152

Discussion with @susanodd today. We decided on these things:

Woseseltops commented 3 months ago

@susanodd, to make a view accessible as API endpoint:

susanodd commented 3 months ago

[SKIP, do it old school] @Woseseltops I'm having trouble creating the tokens. I need an appropriate example. 99 percent of all the web documentation turns out to be the Rest framework. And the searching gets worse when you add "without rest framework" or "not using rest framework". I tried adding Python 3.12 to it, since it's not supported by rest. I've been searching for 2 hours now.

(The google search engine keeps altering what I'm trying to query and makes it still looking for authentication with rest framework. Is it owned by Google or something?)

I installed the "auth_token" module. There is lots of documentation, but lacking examples.

susanodd commented 3 months ago

[SKIP, do it old school] @Woseseltops this is separate from the previous paragraph. It won't migrate the database using the auth_token. (Same as when trying to migrate with the rest framework.) Does this not work with Python 3.12 either?

django.db.utils.DatabaseError: database disk image is malformed

susanodd commented 3 months ago

[SKIP, do it old school] @Woseseltops (read the above first). How to solve this? Can we put the IP address of UvA in allowed servers in the apache settings?

The alternative is to implement the tokens from scratch.

susanodd commented 3 months ago

[OLD SCHOOL]

The token will be created by hand using old methods of creating keys for this.

I'm working on the session variable for dark mode (#1166).

I'm wondering:

When we create such a token for the API, isn't the actual "token" going to be visible in the calls to signbank from outside of signbank? (Session communication with the server.) What's to stop spies from just listening in and grabbing the token?

This is what I was trying to ask on Friday @Woseseltops : is the "streaming" done on the (Rest) tokens done in order to make the communicated tokens not be "visible" in their entirety to aliens trying to monitor communication to Signbank?

(Or do we need to encrypt them as well? Like for passwords.)

(This is not a nonsense question, the Google analytics records the communication, see the old issue about website statistics.)

susanodd commented 3 months ago

I made a first go at this, by creating a class for the tokens and accessing them in the User Profile.

But I am not the right person to code the creation and employment of these tokens in the API communication.

@Woseseltops according to @Woseseltops has lots of experience using API tokens.

Woseseltops commented 2 months ago

Hey @susanodd , I read you were hoping I'd do this one! I'll start with how I would approach this one; perhaps once you see the design, you'll be faster actually implementing it, but we'll see :) .

Phase 1: have an auth token Django model. Once you create it, it automatically generates content like this:

def generate_auth_token(length=16):
    """Generate a random authentication token."""
    alphabet = string.ascii_letters + string.digits
    return ''.join(secrets.choice(alphabet) for _ in range(length))

but it only stores the hash of this content

def hash_token(token):
    """Hash the token using SHA-256."""
    hash_object = hashlib.sha256(token.encode())
    return hash_object.hexdigest()

You can store this like a normal Django model field. Furthermore, this token model stores who generated it, its expiration date, and (like we discussed) the scope, ie which datasets.

Phase 2: have a button on the user profile to generate such a token, and show the content only once. Phase 3: add to the profile the number of active tokens for this user. Phase 4: create a local script to send requests encrypted with the token. Should be something like this:

import requests

def make_api_call(url, auth_token):
    headers = {
        'Authorization': f'Bearer {auth_token}',
        'Content-Type': 'application/json'  # Modify according to your API requirements
    }

    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Raise an exception for HTTP errors (status codes 4xx, 5xx)
        return response.json()  # Assuming the response is in JSON format
    except requests.exceptions.RequestException as e:
        print(f"Error making API call: {e}")
        return None

# Example usage:
if __name__ == "__main__":
    url = 'https://api.example.com/data'  # Replace with your API endpoint
    auth_token = 'your_auth_token_here'  # Replace with your authentication token
    response_data = make_api_call(url, auth_token)

    if response_data:
        print("API call successful!")
        print("Response data:", response_data)
    else:
        print("API call failed.")

Phase 5: create a test API endpoint (view) with the decorator mentioned above, that as a first check verifies that a token with the same hash exists in our database.

auth_token = request.headers.get('Authorization', '').split('Bearer ')[-1]
hashed_token = hash_token(auth_token)

If this works, this check should be added to all views.

susanodd commented 2 months ago

@Woseseltops I figured out how to create the token only once. But at the moment, it's unclear whether the first two need to be methods in the model class SignbankToken. OO suggests yes.

Woseseltops commented 2 months ago

I also suggest yes, but it doesn't really matter :)

uklomp commented 2 months ago

Hi @Woseseltops , it's correct that you're waiting for Susan to implement this code, right? Or is this code for Gomer? In the meeting that I had with Susan, it seemed she was not sure about how to proceed. She knows by the way I'm asking about this :)

Woseseltops commented 2 months ago

Hey @uklomp, either Susan or myself, depending on who has time earlier! Making the technical design would be the first step anyway; I shared it so Susan can take over if she wants.

susanodd commented 2 months ago

I'm working on this now.

@Woseseltops the test setup does not work. PyCharm runserver does not allow CORS. The request is being blocked.

Cross-Origin-aanvraag geblokkeerd: de Same Origin Policy staat het lezen van de externe bron op http://localhost:8000//dictionary/api_create_gloss/1/ niet toe. (Reden: CORS-header ‘Access-Control-Allow-Origin’ ontbreekt). Statuscode: 200.
Cross-Origin-aanvraag geblokkeerd: de Same Origin Policy staat het lezen van de externe bron op http://localhost:8000//dictionary/api_create_gloss/1/ niet toe. (Reden: CORS-aanvraag is niet gelukt). Statuscode: (null).

@Woseseltops after installing the module django-cors-headers it still gives this error.

susanodd commented 2 months ago

This is on signbank-dev

I forgot how I set the CORS on the other servers. It's still giving the error above.

@Woseseltops how to do this?

Woseseltops commented 2 months ago

Okay annoying that the CORS module is not helping :( . I thought about this some more, and came to the conclusion is that the Signbank API might not need allow for CORS, as it will typically not be called from a browser. What if you do step 4 with Python, like the proposed example script above?

susanodd commented 2 months ago

Doesn't this defeat the purpose of using an API?

susanodd commented 2 months ago

This is working now.

susanodd commented 2 months ago

This is working now!

susanodd commented 2 months ago
  1. What is worrying about this, the token appears unencrypted in the header. (As shown above in the code.)

Do we need something like this for sign collect to restrict usage?

CORS_ORIGIN_WHITELIST = (
    'google.com',
    'hostname.example.com',
    'localhost:8000',
    '127.0.0.1:9000'
)
susanodd commented 2 months ago

Okay annoying that the CORS module is not helping :( . I thought about this some more, and came to the conclusion is that the Signbank API might not need allow for CORS, as it will typically not be called from a browser. What if you do step 4 with Python, like the proposed example script above?

This initially had something to do with the order of declaration of the various parts in the settings file. The cors declaratives need to be before the signbank declaratives.

@Jetske and @susanodd spent today getting the create gloss api call to work. Some idiosyncratic naming conventions.

What remains is to rewrite the token generation to allow multiple datasets per token for the user profile page. This requires a migration so it's not included on signbank-dev yet, since we had to generate it a couple of times.

The Wiki needs to be updated to show the usage of the tokens, including the syntax.

The code is in the registration model, except for the tokens themselves, which are in signbank models.

An expiry has not been implemented yet.

The tokens on signbank-dev can be experimented with, but they will need to be deleted before the new migration is applied, which allows multiple tokens for the user, datasets.

I guess it's possible to put the development migrations on signbank-dev. But then this database will need to be reverted back to an older database to undo them if they get changed again. (Remember signbank-dev is only for development. Remember to save your token, as you can only generate it once.)

susanodd commented 2 months ago

The newest code is on signbank-dev now.

susanodd commented 2 months ago

Oops

susanodd commented 2 months ago

Oops

susanodd commented 2 months ago

oops

susanodd commented 2 months ago

@Woseseltops I added the ajax call for creating a gloss to the wiki. Only the first fields can be in the data for creation. (Dataset....Senses) The others need to be in an update once the gloss is created. See the example of creation in the wiki.

There is an example html file (without Django) in the pull request. virtual_machine_api_test_token.html (see the pull request files) You only need to put your own token in. This is set up for the test dataset on signbank-dev. This file needs to be with file:// in your own browser on your own computer. @Jetske did the syntax for creating the senses. It's a list of lists without quotes around the list. https://github.com/Signbank/Global-signbank/issues/1207#issuecomment-2039409055

Perhaps that test html could be modified in such a way to be available in the wiki? Say with some text about the format for the senses. It would need to have an extra html input field for the token that only gets put into the url.

Woseseltops commented 1 month ago

Clear, thanks @susanodd . For completeness, I believe that Signbank's API will mostly be called from Python.

Now that the token is implemented, what endpoints already require authentication? If you could list it here, I'll make sure to add this info to the wiki somehow.

susanodd commented 1 month ago

Okay, great. The wiki got a bit out of hand as four of us have worked on it. Is it necessary to also add the API token to the read urls? It has only been added to the update ones. All of the functionality is multilingual now. But the ones that take the "headers" get the language there, if provided, otherwise English is the default. The original retrieval urls look at the language code provided by Django. (That was causing problems with the API token because it said the data in the ajax call was being fetched multiple times and it can only be fetched once. That is probably Django that does this. So I moved the language into the header.)

The wiki has become partly onoverzichtelijk because it has been written over several months, as more functionality became available.

Woseseltops commented 1 month ago

Is it necessary to also add the API token to the read urls? It has only been added to the update ones.

I think so! Divya is using the getLexiconIdentifications end point, but she only ever gets [NGT] regardless of token.

susanodd commented 1 month ago

Okay.

I'm working on the update gloss to change lemma and annotation translations (#1243) at the moment.

susanodd commented 1 month ago

Is it necessary to also add the API token to the read urls? It has only been added to the update ones.

I think so! Divya is using the getLexiconIdentifications end point, but she only ever gets [NGT] regardless of token.

@Woseseltops Where is this endpoint? This does not appear anywhere in the signbank code.

I added the token to the retrieval functions, but not yet to "package" or "info". (I'm not sure how people are calling those, since they are the original urls from before we started making an API. The package and info take parameters in the URL as GET. The new ones are POST.)

Woseseltops commented 2 weeks ago

It was dictionary/info, it's the function called info() in dictionary/views.py. Right now, ELAN authenticates the 'browser' way with credentials. I suggest to keep supporting this authentication method for a while in addition to the (not yet implemented) token, at least for this end point, so we're not forcing the ELAN devs to make changes. If we really do want to stop supporting the old auth, we should discuss with Divya.

susanodd commented 2 weeks ago

We can do both. The other API urls that are new support both. If the token is not found, it checks the normal way. I can modify the original ones to do it this way. (This was needed to facilitate using the LANGUAGE to have them multilingual.) For info this will work, but the zip retrieval uses url parameters, not POST