daisylb / bridgekeeper

Django permissions, but with QuerySets
https://bridgekeeper.leigh.party/
MIT License
41 stars 14 forks source link

Rule.filter() and distinct QuerySets #21

Open KyeRussell opened 4 years ago

KyeRussell commented 4 years ago

I suspect that this may be a case of user error, but I will raise it here anyway.

I have a custom Rule, where I am overriding query() (as expected). I just ran into a bit of an unexpected production bug where an UpdateView using Bridgekeeper's QuerySetPermissionMixin raised MultipleObjectsReturned. This was because the query built by my Rule() ended up returning duplicate records.

I could override get_queryset() on my view, however this is slightly verbose for what will likely be a widespread issue given the nature of my Rule. I would need to apply this verbose override to a lot of views (or things that eventually make use of QuerySet.get()).

Do you feel that a solution to this should be incorporated within Bridgekeeper itself? I am weary of the fact that DISTINCTis a potentially expensive operation that should not be blanket applied.

In the meantime, I have overridden filter() (which the documentation explicitly tells you not to do) and added a .distinct() there. In my circumstances, I was willing to take the performance hit, as non-distinct results are possible with every use of my Rule.

daisylb commented 4 years ago

Hm. That's a good question.

Without actually having seen your rule and the generated QuerySet object and resulting SQL, it sounds like this is one of those cases where our decades-long practice of shoehorning the relational model where it doesn't belong via ORMs is coming back to bite us!

My gut suspicion is that Bridgekeeper probably should apply DISTINCT in cases where a single result is expected (but not otherwise). I'd need to look it up and/or experiment (and these things vary between DBMSes anyway), but I would expect that since an UpdateView is going to be querying for an exact match on a candidate key, you'd have to work hard to produce any sort of meaningful performance degradation.

The other option is refactoring how relations work so that the generated SQL is more likely to use subqueries instead of JOINs, which should prevent duplicate rows, but that may require some API changes, and the recent addition of R objects complicates that further.

Can you share more about your custom Rule subclass?

philipstarkey commented 3 years ago

Hi Leigh,

I've recently started working with Kye and we've hit this issue again (but slightly differently). In this instance, Kye's override of filter() that added a distinct() call onto the queryset was ignored because the custom rule was used via bridgekeeper.rules.Relation. In this instance, the filter override is never called because bridgekeeper (for obvious reasons) calls the filter of the outer most Rule object. For the moment we're working around it by monkey-patching bridgekeeper.rules.Rule.filter at a global level to always add distinct(), but we'd very much like to work with you (and contribute our own development time if that would help) to create a more permanent solution.

To provide some more context to how we encounter this problem (names are made up but this is the general idea):

We have a model structure as follows, where A -> B indicates that model A has a foriegn key defined in it pointing to B (and A <- B indicates the reverse):

Item -> Parent <- UserToParentAssociation -> DjangoGroupModel
                                          -> CustomUserModel

We then have a custom Rule called HasParentModelPermission with a query method defined as:

    def query(self, user):
        return Q(
            userassociation__group__permissions__content_type=self.content_type,
            userassociation__group__permissions__codename=self.codename,
            userassociation__user=user,
            userassociation__is_active=True,
        )

where content_type and codename are passed into the Rule at instantiation time.

The permission itself is then defined as:

perms["items.view_item"] = Relation(
    "parent",
    HasParentModelPermission(
        content_type=ContentType.objects.get_for_model(models.Item),
        codename="view_item",
    ),
)

So we use Django groups as a way of defining a set of permissions, and associate a user with a group (and thus permissions) for a specific instance of a Parent model via a UserToParentAssociation model instance. The content_type stuff is there because we actually have several completely independent models like Item which have multiple instances attached to a Parent and we wanted to use the same rule.

Our issues (in this case) stem from the fact that multiple UserToParentAssociation instances can exist for a user/parent pairing.

I suspect this isn't just unique to this specific situation, but any rule check that flows through a many-to-many relationship that does not enforce unique constraints on the foreign keys in the through models? (although, our models are not actually using Django's m2m field for presumably historic reasons I'm not familiar with! But the way the foreign keys are defined I think effectively makes it a m2m relationship)

Hopefully this helps to provide some insight into the issue we ran into. Do you have any thoughts on how bridgekeeper could be adapted to accommodate our scenario?

P.S. It's probably worth mentioning we're still running on v0.8, but I don't think that R objects would make a difference in this scenario.

KyeRussell commented 3 years ago

Happy for Phil to speak on my behalf on this one. Cheers.