ckan / ideas

[DEPRECATED] Use the main CKAN repo Discussions instead:
https://github.com/ckan/ckan/discussions
40 stars 2 forks source link

context anti-patterns #53

Open wardi opened 10 years ago

wardi commented 10 years ago

The context dict that is passed to action functions when called locally or from controllers is often abused. I can see the argument for some kind of object that includes security-sensitive values like the user object or values like model that could let you mock out the real models for tests.

However, we have at least three classes of values that we shouldn't be passing as part of a "context":

  1. using context instead of normal parameters for calls from an action function to other parts of ckan. e.g. the 'prevent_packages_update' passed to group_dict save (which I'd fix this way: https://github.com/wardi/ckan/commit/303ab34354dafd214144ff49117df4403ce9174a )

    There's no reason at all I can see for this sort of use. it's a value used by the immediate function we're calling and only that function. That's what parameters are for. Python even lets you set a default and has conventions for documenting these things.

  2. using context to pass non-sensitive values many levels to where they are actually used. e.g. the revision_id/revision_date values that can be passed to package_show https://github.com/ckan/ckan/blob/master/ckan/controllers/package.py#L356-L372

    This is an actually-useful parameter that should be part of the package_show API! Why is it being hidden in the context object?

    When we use context to pass a value many layers deep it's very hard to reason about the behaviour of code without understanding everything above and below. Let's add revision_date to package_show's data_dict, and use a normal parameter on the way down.

  3. In some places because context is a mutable object it's also used to pass values up to callers. e.g. adding a package object for use by a the REST api controller after package_create: https://github.com/ckan/ckan/blob/master/ckan/logic/action/create.py#L197-201

    This sort of thing has led to a number of bugs myself and others have had to fix The tendency to reuse a context between many call_action calls makes this even worse. I had to help a co-workers debug an issue where some unrelated action calls would work when done in one order but not another that was caused by this.

    This also makes it really hard to reason about what an action call will do. You need to know everything that might have ever been put in a context object by anyone above, below or in code that happened to run before the code you're looking at.

My preference would be to make context an immutable object like a NamedTuple that can only ever have a few expected values: e.g. model, session, user. Because it's immutable you're forced by the language to create a new one any time you need to change something, and it's always safe to re-use from one call to the next.

Such an object could mostly maintain backwards compatibility by defining a __getattr__ method that returns the expected attributes.

rufuspollock commented 10 years ago

I'm +1 and i'm also not really sure why we use it for encapsulating model, session etc (why couldn't those be arguments to logic functions ...)

joetsoi commented 9 years ago

So I currently think contexts in their current state are pointless.

I'm assuming contexts only contain model, session and user. I get why context exists, it basically comes down to whether

There is no right answer, the guy who wrote flask wrote this a blog post detailing his hate for thread locals before he wrote flask, where the decision was made to use them, I know that David R is not a fan of them

The problem I think we have, is that ckan straddles both sides and we end up with a worse solution. Currently in ckan, we use the scoped_session, which is a threadlocal based setup for sqlalchemy, So unless theres is some additional voodoo(possibly vdm, i haven't looked at that code), then ckan.model.Session and context['session'] are currently always the same. This is incredibly confusing, especially to new ckan developers.

The way we supply user into the context is a pain in the backside, for example when writing tests. When you're writing a test using the factories and you're passing a user in as an argument, but you might have been given a user_dict by another action function in a test, so what do you use? the username or the user_dict? The answer is we have magic which takes a stab at figuring it out so the context['user'] is setup properly.

The problem is even with context, we're still using all the pylons g,c and whatever else everywhere anyway. So we have the problem of threadlocal objects everywhere in the code, but we still have to pass our session/user everywhere, (but not the request, we use pylons threadlocal request for that). Hence why I say contexts in their current state are pointless.

So some of our options could be.

I'm thinking currently that I like helpers get_current_username as it can be mocked out in tests and users of our code won't need to think of this crap and just call toolkit.get_current_username in their code, and I also like just letting them use model.Session because they won't sit their thinking 'wtf is this context thing' whenever they want to call an action function in their extension, but I haven't really thought any of this through fully.

TL;DR I'm sure I'll change my mind by next month, but generally, we should change context so that it's not baffling to the user. I think by hiding from the user entirely, but we could change it to a namedtuple and documenting what the hell it actually is, or just making them non confusing normal parameters

This post is too long now and i'm sure we won't have figure it out in a year anyway.

joetsoi commented 9 years ago

Dammit, I was just falling asleep when I thought "model and session should be optional parameters where their default arguments are ckan.model and can.model.session" so the caller doesn't have to specify them each time they use an action function."

So maybe I've changed my mind already

rossjones commented 6 years ago

I've just been bitten by this issue writing a paster command, because I forgot I need a new context dict each time I call a function returned from get_action. Specifically I was calling harvest_source_create/harvest_source_update and getting db conflicts because the id in the context was being re-used, rather than the data_dict I'd provided :( (FTR harvest_sources are packages)

Making the context immutable seems like the lowest impact solution (although I suspect it's used a lot in the logic layer when calling other functions). It's also likely to break some extensions that might rely on this (no specific example) - given it might be a breaking change, is it a C3 thing?