cldellow / datasette-ui-extras

Add editing UI and other power-user features to Datasette.
Apache License 2.0
12 stars 1 forks source link

add `current_actor()` SQLite function #50

Closed cldellow closed 1 year ago

cldellow commented 1 year ago

I'm not sure how achievable this is. #49 explores creating something like updatable views for the turker use case. It'd be handy to have a trigger that can set last_edited_by = current_actor(), where current_actor() is evaluated in the context of the current request to get the user's ID.

Can ... can we do that?

Simon mentioned maybe you can abuse asgi_wrapper to get access to the request: https://discord.com/channels/823971286308356157/823971286941302908/1070385463640215702

cldellow commented 1 year ago

Hmmm, this might work for read-only queries, but I think write queries are executed in another thread?

Maybe we could use datasette-rewrite-sql to rewrite the query to include a comment with the actor ID, and then, just before actually passing the SQL off to SQLite, we'd set a thread local.

cldellow commented 1 year ago

https://blog.sqreen.com/asyncio/ seems relevant

cldellow commented 1 year ago

Huh, would this also let you write views like:

CREATE row_level_security_view AS
SELECT * FROM data
WHERE approved_by = current_actor() OR current_actor() = 'superuser'

The view would only work in the context of Datasette, not via vanilla sqlite CLI.

cldellow commented 1 year ago

The request that actor_from_request is passed is a datasette thing: https://github.com/simonw/datasette/blob/0b4a28691468b5c758df74fa1d72a823813c96bf/datasette/utils/asgi.py#L56

We could mint one in order to call actor_from_request...but I wonder if plugins depend on getting the same request (eg, to cache a lookup, to log things).

The actual auth happens here: https://github.com/simonw/datasette/blob/0b4a28691468b5c758df74fa1d72a823813c96bf/datasette/app.py#L1481-L1522

Maybe we can write a hookwrapper for actor_from_request? See https://pluggy.readthedocs.io/en/stable/#wrappers

cldellow commented 1 year ago

We might also be able to have a hook that stashes the request in the current_task ? Then we could always inspect stashed_request.scope.actor.

cldellow commented 1 year ago

Stashing the request seems a little less skeezy, then we don't have to interrogate the result object and risk Datasette's logic changing.

cldellow commented 1 year ago

Even once we've stashed it, we have the problem that we're in some asyncio context, and the actual execution of the SQL may be in a different thread, e.g. for writes.

A not-very-good solution: in rewrite_sql replace all instances of current_user() with a literal value.

Ah, but this doesn't actually solve the case I care about, which is I want to be able to create views with RLS where clauses, and have triggers that do the right thing.

cldellow commented 1 year ago

Another oddness:

I did create view xxx as select current_user(), then went to http://localhost:8001/cooking/xxx

actor_from_request hadn't/didn't run before rewrite_sql...maybe we need something earlier?

cldellow commented 1 year ago

Alternate idea: datasette-rewrite-sql only patches the entry points like execute, execute_write, execute_write_many, execute_write_script.

This isn't actually what needs to be patched -- we need to patch execute_fn and execute_write_fn, to control setting the value right before we pass control to SQLite.

cldellow commented 1 year ago

I have a buggy proof of concept, it's incorrect under concurrent load.

I added an actor_from_request hook like:

@hookimpl
def actor_from_request(datasette, request):
    _id = None
    if 'x-me' in request.headers:
        _id = request.headers['x-me']
    return {'id': _id}

And then tested by having multiple bash windows open like:

while :; do curl -H "x-me: $$" http://localhost:8001/cooking/xxx.json?_shape=array; echo; done

I expect the output from each window to always show the same value, but instead, I get interleaved results.

This means my understanding of something is flawed. I wonder if it works with num_sql_threads=1? I stash the request on the asyncio.current_task(), but when we transition into SQLite I stash the ID in a global...that might be flawed.

cldellow commented 1 year ago

--config num_sql_threads=1 gives correct results, so yeah, a global is not an OK way to store this.

cldellow commented 1 year ago

Using a threading.local() resolves this

cldellow commented 1 year ago

This might be worth its own plugin, and we'll just depend on it.

cldellow commented 1 year ago

Released as https://github.com/cldellow/datasette-current-actor