danielplohmann / mcrit

The MinHash-based Code Relationship & Investigation Toolkit (MCRIT) is a framework created to simplify the application of the MinHash algorithm in the context of code similarity.
GNU General Public License v3.0
86 stars 12 forks source link

Add auth to API routes #52

Closed yankovs closed 11 months ago

yankovs commented 12 months ago

Hey!

I'd like to limit mcrit's api via auth. I've looked at the code, and seems like the basic auth functionality is basically implemented, but the routes themselves aren't protected (if I'm wrong please tell me! :D). What do you think? I don't know falcon that well but looking online it seems like writing middleware to enable auth are not that difficult to implement: https://falcon.readthedocs.io/en/stable/user/quickstart.html#a-more-complex-example

danielplohmann commented 12 months ago

Hi! Yes, I also think it should be easy to implement this. Correct, right now, all routes in the MCRIT backend service can be used without authentication. Currently, this is semi-addressed by the API pass-through in mcritweb requiring token-authentication. I will have time working on MCRIT starting next week again and implement some of the low-hanging-fruit issues like this one. :)

yankovs commented 12 months ago

Thank you very much! :) I have been wondering about some implementation detail of the whole token system. It seems like the token itself is just a random UUID that is md5 hashed (https://github.com/fkie-cad/mcritweb/blob/265ccf4189a2aa80a79809c4c2509ae33822385d/mcritweb/views/authentication.py#L73). It seems weird to me since it doesn't entail any of the user details like the login user name. In contrast, something like JWT could have this info.

This makes me wonder how making sure a token is valid would work, is it just a straight up lookup in the sqlite db?

danielplohmann commented 11 months ago

Hey!

I have used the AuthMiddleware as suggested to implement a basic authentication scheme for the backend server. The token can be set in the [McritConfig].(https://github.com/danielplohmann/mcrit/blob/61ba87ad0189fde58022320bc5c6ede19aad1adf/mcrit/config/McritConfig.py#L19)

It seems weird to me since it doesn't entail any of the user details like the login user name. In contrast, something like JWT could have this info.

I know that it's possible to derive validity of tokens based on stored session information, but as long as this is not a high-load issue, I think doing DB lookups for authentication don't hurt that much (thus doing the actual lookup for user/token as you described). But I agree that there is always the option to move to other auth schemes when the need arises.