GeeWee / django-auto-prefetching

Automatic prefetching for Django
MIT License
231 stars 18 forks source link

Additional unnecessary prefetching occurs in some edge cases #27

Closed kmcswiney closed 3 years ago

kmcswiney commented 3 years ago

First of all thanks for this awesome package.

I noticed an issue whereby if your serializer does not make use of any Foreign Key relations in the underlying Django model, and those relations are not nullable, they will ALL be prefetched. This could cause a performance issue for people who want to use a serializer that does not render those relations as they will be prefetched even though they are never going to be serialized.

The root cause appears to be the behaviour specified by the Django docs:

There may be some situations where you wish to call select_related() with a lot of related objects, or where you don’t know all of the relations. In these cases it is possible to call select_related() with no arguments. This will follow all non-null foreign keys it can find - nullable foreign keys must be specified. This is not recommended in most cases as it is likely to make the underlying query more complex, and return more data, than is actually needed.

For example if we had this set of models and serializer:

class MyRelation(models.Model):
  pass

class MyModel(models.Model):
  textfield: models.TextField()
  relation = models.ForeignKey(MyRelation, on_delete=models.CASCADE)

class MySerializer(serializers.ModelSerializer):
  class Meta:
    model = MyModel
    fields = ('textfield',)

The link from MyModel to MyRelation is not specified for use in the serializer, but this package will prefetch it anyway. The reason is because in this section of code:

def prefetch(queryset, serializer: Type[ModelSerializer]):
    select_related, prefetch_related = _prefetch(serializer)
    try:
        return queryset.select_related(*select_related).prefetch_related(
            *prefetch_related
        )

In this scenario the select_related evaluates to an empty set, and as a result, queryset.select_related thinks you want it to automatically prefetch any non-nullable relationships.

I was going to submit a fix for this like so:

def prefetch(queryset, serializer: Type[ModelSerializer]):
    select_related, prefetch_related = _prefetch(serializer)
    try:        
        if select_related:
            queryset = queryset.select_related(*select_related)
        if prefetch_related:
            queryset = queryset.prefetch_related(*prefetch_related)
        return queryset

After re-running the tests, two tests fail:

FAIL: test_reverse_foreign_key_lookups (test_project.tests.tests.TestOneToMany)
AssertionError: 2 != 1 : 2 queries executed, 1 expected
Captured queries were:
1. SELECT "test_project_childb"."id", "test_project_childb"."childB_text", "test_project_childb"."parent_id" FROM "test_project_childb"
2. SELECT "test_project_toplevel"."id", "test_project_toplevel"."top_level_text" FROM "test_project_toplevel" WHERE "test_project_toplevel"."id" = 1 LIMIT 21
FAIL: test_it_prefetches_when_using_dotted_property_access (test_project.tests.tests.TestOneToMany)
AssertionError: 2 != 1 : 2 queries executed, 1 expected
Captured queries were:
1. SELECT "test_project_childb"."id", "test_project_childb"."childB_text", "test_project_childb"."parent_id" FROM "test_project_childb"
2. SELECT "test_project_toplevel"."id", "test_project_toplevel"."top_level_text" FROM "test_project_toplevel" WHERE "test_project_toplevel"."id" = 1 LIMIT 21

Both tests are related to the use of the ChildBSerializerWithDottedPropertyAccess. The failures are not directly caused by my change, but rather, they were previously passing for the wrong reason: the relations were only being prefetched by default after the package calls select_related with an empty set rather than with the name of the actual relation associated with that dotted property.

As a result I decided not to submit a PR and instead ask whether this was a known issue or intended behaviour. It seems like the dotted property access never actually worked as it was intended and the tests associated with this behaviour were really just passing by accident. But by fixing this the overzealous pre-fetching issue, we expose the issue with these dotted properties.

kmcswiney commented 3 years ago

Also I realise that since this edge case only happens when you have a serializer that doesn't specify any relations, the question might be asked why this package is needed at all. The reason for me is that I am working on a project whereby serializer fields can be dynamically changed by the user, using this example from the DRF docs which basically lets an API caller specify which fields they are interested in for each API call. The goal being if an API caller is not interested in a related field then they don't ask for it, and that relation is not loaded from the database. If they do want it then they ask for it, and then it would get prefetched.

GeeWee commented 3 years ago

Sorry for taking a little while to get around to this. I'm pretty sure that you're right that these tests passed unintentionally. I've merged your excellent PR and published a new release to PyPI! :tada:

kmcswiney commented 3 years ago

Awesome, thanks!