Large queries cause errors

beruic commented 5 years ago

I have a product where around 300k instances are created in an import.

Before importing (which I fear may fail as well), I perform an existence check against a single field in the form MyModel.objects.filter(my_field__in=new_values).exists() where new_values is a set of strings.

Here is the stack trace of the issue:

Traceback (most recent call last):
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/sql_server/pyodbc/base.py", line 536, in execute
    return self.cursor.execute(sql, params)
pyodbc.ProgrammingError: ('The SQL contains 6623 parameter markers, but 137695 parameters were supplied', 'HY000')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/home/jk/git/my-project/core/management/command_util.py", line 117, in wrapper
    return f(*args, **kwargs)
  File "/home/jk/git/my-project/core/management/command_util.py", line 145, in wrapper
    return f(*args, **options)
  File "/home/jk/git/my-project/core/management/command_util.py", line 117, in wrapper
    return f(*args, **kwargs)
  File "/home/jk/git/my-project/core/management/commands/my_import_command.py", line 159, in handle
    verify_only=verify_only,
  File "/home/jk/git/my-project/core/util.py", line 150, in import_and_verify_terms
    import_and_log_new_terms(user, model, new_terms[search_type], source_description)
  File "/home/jk/git/my-project/core/util.py", line 228, in import_and_log_new_terms
    if model.objects.filter(term__in=new_terms).exists():
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/models/query.py", line 766, in exists
    return self.query.has_results(using=self.db)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/models/sql/query.py", line 522, in has_results
    return compiler.has_results()
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1070, in has_results
    return bool(self.execute_sql(SINGLE))
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1100, in execute_sql
    cursor.execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 99, in execute
    return super().execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/sql_server/pyodbc/base.py", line 536, in execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: ('The SQL contains 6623 parameter markers, but 137695 parameters were supplied', 'HY000')

beruic commented 5 years ago

This issue seems similar to this one from the original repository: https://github.com/michiya/django-pyodbc-azure/issues/143

At least the hot-fix solution if the same: Chunk requests in chunks of 2000 and swallow the bad performance :)

etiennepouliot commented 4 years ago

Here a workaround :

new_values = ','.join(new_values)
MyModel.objects.extra(where=[f"my_field in ( select * FROM SPLIT_STRING('{new_values}'),',')"]).exists()

Basically what it does it send a comma seperated value and then resplit it to use it in a query.

etiennepouliot commented 4 years ago

Don't event need SPLIT_STRING...

new_values = str(new_values).replace( '[' , '(' ).replace( ']' ,')' )
MyModel.objects.extra(where=[f"my_field in {new_values}"])

etiennepouliot commented 4 years ago

The issue that I have with my previous solutions is that it only works on a case by case basis and it will not work if we use prefetch_related with more that 2100 related objects.

To fix this , create a lookup.py next to your models.py with :

from django.db.models.fields import Field from django.db.models.lookups import In
@Field.register_lookup
class In(In):
    lookup_name = 'in'

    def as_sql(self, compiler, connection):
        max_in_list_size = 2100
        if self.rhs_is_direct_value() and max_in_list_size and len(self.rhs) > max_in_list_size:
            return self.split_parameter_list_as_sql(compiler, connection)
        return super().as_sql(compiler, connection)

    def split_parameter_list_as_sql(self, compiler, connection):
        lhs, lhs_params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.batch_process_rhs(compiler, connection)
        in_clause_elements = f"( {lhs} IN ( SELECT * FROM SPLIT_STRING(%s)))"
        params = [','.join(map(str,rhs_params)) ]
        return in_clause_elements, params

make sure to import in your models.py file :

from .lookups import In

and voilà it will works!

etiennepouliot commented 4 years ago

Split_string was added in SQL 2016, for previous version this function works for me : https://stackoverflow.com/questions/10914576/t-sql-split-string

techknowfile commented 3 years ago

@etiennepouliot I'm trying your solution, but despite importing it into models.py, those functions in the IN lookup don't seem to be ran when I call prefetch_related. Is this not meant to work for that?

I'm using graphene django with gql_optimizer, which results in one giant query with 25000+ parameters

techknowfile commented 3 years ago

A solution has been implemented on the dev branch of the Microsoft fork. See my issue on that repo for details.

The solution uses the Microsoft recommended solution to large parameter lists by creating a TEMP TABLE and joining over that.

etiennepouliot commented 3 years ago

@etiennepouliot I'm trying your solution, but despite importing it into models.py, those functions in the IN lookup don't seem to be ran when I call prefetch_related. Is this not meant to work for that?

I'm using graphene django with gql_optimizer, which results in one giant query with 25000+ parameters

Indeed it's not working for prefetch_related this way. I ended up editing directly django/db/models/lookups.py in my project as I didn't know how to overwrite this file globally.

If anybody know how, I would appreciate. Maybe I need to dig this deeper.

ESSolutions / django-mssql-backend

Large queries cause errors #9