Multiple `CountOf`s can sometime collide and result in a heinously slow query

pmg103 commented 4 years ago

I think this is our basic underlying problem https://stackoverflow.com/questions/29195299/why-is-this-django-1-6-annotate-count-so-slow

It's almost like a sort of django/postgres mismatch rather than an SSM issue per se -- however SSM allows you to very easily end up with this situation with no obvious way out.

Possible solutions could be:

warn when multiple CountOfs are combined
error when multiple CountOfs are combined
implement countof by fetching a values_list of related IDs and calling len() on that list if there's more than one

(Something like a generic version of this:

class CountOfUsers(SerializationSpecPlugin):
    def modify_queryset(self, queryset):
        return queryset.prefetch_related(Prefetch(
            'users',
            queryset=ActivityUser.objects.only('id', 'activity_id'),
            to_attr='user_ids'
        ))

    def get_value(self, instance):
        return len(instance.user_ids)

# ... 

    serialization_spec = [
        # ...
        {'num_users': CountOfUsers()},

)

Or some combination of the above

j4mie commented 4 years ago

What about using the subquery approach in the answer to that SO question? That should work on any currently-supported version of Django.

I think the problem with trying to figure out a generic way to do this is that it's going to depend on the size of your N. If you have a small number of things in your main queryset you might even be quicker doing

    @zen_queries.queries_dangerously_enabled()
    def get_value(self, instance):
       return instance.whatever.count()

If your number of related items is big, I suspect your prefetch-then-len() is going to be pretty rubbish.

pmg103 commented 4 years ago

I agree. Could the subquery approach be implemented as a SerializationSpecPlugin? 🤔

j4mie commented 4 years ago

I don't see why not. If it isn't currently possible, it definitely should be.

RealOrangeOne commented 4 years ago

Subquery like that is reasonably simple to implement, however gets harder if you needed to count a nested relation, especially if you need the DISTINCT count

dabapps / django-rest-framework-serialization-spec

Multiple `CountOf`s can sometime collide and result in a heinously slow query #62