Closed cldellow closed 1 year ago
Hmmm, this might work for read-only queries, but I think write queries are executed in another thread?
Maybe we could use datasette-rewrite-sql to rewrite the query to include a comment with the actor ID, and then, just before actually passing the SQL off to SQLite, we'd set a thread local.
https://blog.sqreen.com/asyncio/ seems relevant
Huh, would this also let you write views like:
CREATE row_level_security_view AS
SELECT * FROM data
WHERE approved_by = current_actor() OR current_actor() = 'superuser'
The view would only work in the context of Datasette, not via vanilla sqlite CLI.
The request that actor_from_request
is passed is a datasette thing: https://github.com/simonw/datasette/blob/0b4a28691468b5c758df74fa1d72a823813c96bf/datasette/utils/asgi.py#L56
We could mint one in order to call actor_from_request
...but I wonder if plugins depend on getting the same request (eg, to cache a lookup, to log things).
The actual auth happens here: https://github.com/simonw/datasette/blob/0b4a28691468b5c758df74fa1d72a823813c96bf/datasette/app.py#L1481-L1522
Maybe we can write a hookwrapper for actor_from_request? See https://pluggy.readthedocs.io/en/stable/#wrappers
We might also be able to have a hook that stashes the request in the current_task ? Then we could always inspect stashed_request.scope.actor.
Stashing the request seems a little less skeezy, then we don't have to interrogate the result object and risk Datasette's logic changing.
Even once we've stashed it, we have the problem that we're in some asyncio context, and the actual execution of the SQL may be in a different thread, e.g. for writes.
A not-very-good solution: in rewrite_sql replace all instances of current_user() with a literal value.
Ah, but this doesn't actually solve the case I care about, which is I want to be able to create views with RLS where clauses, and have triggers that do the right thing.
Another oddness:
I did create view xxx as select current_user()
, then went to http://localhost:8001/cooking/xxx
actor_from_request
hadn't/didn't run before rewrite_sql...maybe we need something earlier?
Alternate idea: datasette-rewrite-sql only patches the entry points like execute
, execute_write
, execute_write_many
, execute_write_script
.
This isn't actually what needs to be patched -- we need to patch execute_fn
and execute_write_fn
, to control setting the value right before we pass control to SQLite.
I have a buggy proof of concept, it's incorrect under concurrent load.
I added an actor_from_request hook like:
@hookimpl
def actor_from_request(datasette, request):
_id = None
if 'x-me' in request.headers:
_id = request.headers['x-me']
return {'id': _id}
And then tested by having multiple bash windows open like:
while :; do curl -H "x-me: $$" http://localhost:8001/cooking/xxx.json?_shape=array; echo; done
I expect the output from each window to always show the same value, but instead, I get interleaved results.
This means my understanding of something is flawed. I wonder if it works with num_sql_threads=1? I stash the request on the asyncio.current_task(), but when we transition into SQLite I stash the ID in a global...that might be flawed.
--config num_sql_threads=1
gives correct results, so yeah, a global is not an OK way to store this.
Using a threading.local()
resolves this
This might be worth its own plugin, and we'll just depend on it.
I'm not sure how achievable this is. #49 explores creating something like updatable views for the turker use case. It'd be handy to have a trigger that can set
last_edited_by = current_actor()
, wherecurrent_actor()
is evaluated in the context of the current request to get the user's ID.Can ... can we do that?
Simon mentioned maybe you can abuse asgi_wrapper to get access to the request: https://discord.com/channels/823971286308356157/823971286941302908/1070385463640215702