etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.
BSD 3-Clause "New" or "Revised" License
1.21k stars 129 forks source link

django.db.utils.OperationalError: string is too long for tsvector #164

Closed alexgtom closed 7 years ago

alexgtom commented 8 years ago

Hi there. I have a feeling this is happening because of some very large text fields in my database. Is there a way to work around this?

I tried filtering out rows with large text fields by doing:

watson.register(MyModel.objects.extra(where=["CHAR_LENGTH(text) < 1048575"])

but, after running buildwatson it still seems to try to index the whole table. I'm running postgresql-9.5 with watson 1.2.1

Traceback (most recent call last):
  File "manage.py", line 11, in <module>
    execute_from_command_line(sys.argv)
  File "/lib/python2.7/site-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
    utility.execute()
  File "/lib/python2.7/site-packages/django/core/management/__init__.py", line 330, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/lib/python2.7/site-packages/django/core/management/base.py", line 393, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/lib/python2.7/site-packages/django/core/management/base.py", line 444, in execute
    output = self.handle(*args, **options)
  File "/lib/python2.7/site-packages/django/utils/decorators.py", line 145, in inner
    return func(*args, **kwargs)
  File "/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 122, in handle
    refreshed_model_count += rebuild_index_for_model(model, engine_slug, verbosity)
  File "/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 54, in rebuild_index_for_model
    _bulk_save_search_entries(iter_search_entries())
  File "/lib/python2.7/site-packages/watson/search.py", line 197, in _bulk_save_search_entries
    SearchEntry.objects.bulk_create(search_entry_batch)
  File "/lib/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/lib/python2.7/site-packages/django/db/models/query.py", line 392, in bulk_create
    self._batched_insert(objs_without_pk, fields, batch_size)
  File "/lib/python2.7/site-packages/django/db/models/query.py", line 937, in _batched_insert
    using=self.db)
  File "/lib/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/lib/python2.7/site-packages/django/db/models/query.py", line 920, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 974, in execute_sql
    cursor.execute(sql, params)
  File "/lib/python2.7/site-packages/django/db/backends/utils.py", line 79, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/lib/python2.7/site-packages/django/db/utils.py", line 97, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
django.db.utils.OperationalError: string is too long for tsvector (3298404 bytes, max 1048575 bytes)
CONTEXT:  PL/pgSQL function watson_searchentry_trigger_handler() line 3 at assignment
etianen commented 8 years ago

Unfortunately, django-watson still indexes every model, with the results being filtered on search. This is because the signal handlers have no way of knowing whether a model matches a DB constraint when it comes to deciding whether to index a model on save.

You could exclude the problematic field from django-watson using exclude=("field_name",). Or you could modify the table to have a smaller maximum field length.

On Thu, 5 May 2016 at 23:36 Alex Tom notifications@github.com wrote:

Hi there. I have a feeling this is happening because of some very large text fields in my database. Is there a way to work around this?

I tried filtering out rows with large text fields by doing:

watson.register(MyModel.objects.extra(where=["CHAR_LENGTH(text) < 1048575"])

but, after running buildwatson it still seems to try to index the whole table. I'm running postgresql-9.5 with watson 1.2.1

Traceback (most recent call last): File "manage.py", line 11, in execute_from_command_line(sys.argv) File "/lib/python2.7/site-packages/django/core/management/init.py", line 338, in execute_from_command_line utility.execute() File "/lib/python2.7/site-packages/django/core/management/init.py", line 330, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/lib/python2.7/site-packages/django/core/management/base.py", line 393, in run_from_argv self.execute(_args, _cmd_options) File "/lib/python2.7/site-packages/django/core/management/base.py", line 444, in execute output = self.handle(_args, _options) File "/lib/python2.7/site-packages/django/utils/decorators.py", line 145, in inner return func(_args, _kwargs) File "/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 122, in handle refreshed_model_count += rebuild_index_for_model(model, engine_slug, verbosity) File "/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 54, in rebuild_index_for_model _bulk_save_search_entries(iter_search_entries()) File "/lib/python2.7/site-packages/watson/search.py", line 197, in _bulk_save_search_entries SearchEntry.objects.bulk_create(search_entry_batch) File "/lib/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset(), name)(_args, _kwargs) File "/lib/python2.7/site-packages/django/db/models/query.py", line 392, in bulk_create self._batched_insert(objs_without_pk, fields, batch_size) File "/lib/python2.7/site-packages/django/db/models/query.py", line 937, in _batched_insert using=self.db) File "/lib/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset(), name)(_args, *_kwargs) File "/lib/python2.7/site-packages/django/db/models/query.py", line 920, in _insert return query.get_compiler(using=using).execute_sql(return_id) File "/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 974, in execute_sql cursor.execute(sql, params) File "/lib/python2.7/site-packages/django/db/backends/utils.py", line 79, in execute return super(CursorDebugWrapper, self).execute(sql, params) File "/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) File "/lib/python2.7/site-packages/django/db/utils.py", line 97, in exit six.reraise(dj_exc_type, dj_exc_value, traceback) File "/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) django.db.utils.OperationalError: string is too long for tsvector (3298404 bytes, max 1048575 bytes) CONTEXT: PL/pgSQL function watson_searchentry_trigger_handler() line 3 at assignment

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/etianen/django-watson/issues/164