Closed dustinrb closed 6 years ago
I missed the transaction.atomic()
decorator in buildwatson.py.
Is there a particular reason why this command is setup this way or can the decorator be removed? Would it be beneficial to add an option to enable/disable a single atomic transaction?
How do you mean, "the buildwatson management command times out"?
Database commands do not time out, normall.
On 28 November 2017 at 19:36, Dustin Broderick notifications@github.com wrote:
I missed the transaction.atomic() decorator in buildwatson.py.
Is there a particular reason why this command is setup this way or can the decorator be removed? Would it be beneficial to add an option to enable/disable a single atomic transaction?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/issues/225#issuecomment-347639019, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJFCCCnkKfsEDz5_mcceZ5SFXtPPh8Mks5s7GDSgaJpZM4Qtvql .
I cannot remember the exact error, but I tried to build the index twice and Django threw an exception after a few hours of work. I can recreate the exception, but it will take me a few hours tomorrow.
I would expect if items are being processed in batches, they would show up continuously throughout the indexing process.
Removing the transaction.atomic()
makes buildwatson
behave as I would expect, though I'm not finished testing it (I can tell you tomorrow how it went).
I don't think that removing the transaction is a good idea.
If the command aborts half way through, you're going to have to run it again anyway. I don't think a transaction makes the command any more likely to fail half way through.
On 28 November 2017 at 20:50, Dustin Broderick notifications@github.com wrote:
Removing the transaction.atomic() makes buildwatson behave as I would expect, though I'm not finished testing it (I can tell you tomorrow how it went).
https://github.com/dustinrb/django-watson/tree/non-atomic http://url
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/issues/225#issuecomment-347659151, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJFCEbHfCbbXMDsQvoeHgLnlxlo0thVks5s7HILgaJpZM4Qtvql .
I was not intending to use a partially indexed database for search. I tried indexing the my database with and without the transaction and it failed with a OperationalError: SSL SYSCALL error
if I had the transaction enabled (trace included below). It successfully indexed once I disable the transaction.
Error trace with the transaction:
Traceback (most recent call last):
File "./manage.py", line 14, in <module>
execute_from_command_line(sys.argv)
File "/usr/lib/python2.7/site-packages/django/core/management/__init__.py", line 354, in execute_from_command_line
utility.execute()
File "/usr/lib/python2.7/site-packages/django/core/management/__init__.py", line 346, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/lib/python2.7/site-packages/django/core/management/base.py", line 394, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/lib/python2.7/site-packages/django/core/management/base.py", line 445, in execute
output = self.handle(*args, **options)
File "/usr/lib/python2.7/site-packages/django/utils/decorators.py", line 145, in inner
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 142, in handle
refreshed_model_count += rebuild_index_for_model(model, engine_slug, verbosity)
File "/usr/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 61, in rebuild_index_for_model
_bulk_save_search_entries(iter_search_entries())
File "/usr/lib/python2.7/site-packages/watson/search.py", line 206, in _bulk_save_search_entries
search_entry_batch = list(islice(search_entries, 0, batch_size))
File "/usr/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 40, in iter_search_entries
for search_entry in search_engine_._update_obj_index_iter(obj):
File "/usr/lib/python2.7/site-packages/watson/search.py", line 491, in _update_obj_index_iter
update_count = search_entries.update(**search_entry_data)
File "/usr/lib/python2.7/site-packages/django/db/models/query.py", line 563, in update
rows = query.get_compiler(self.db).execute_sql(CURSOR)
File "/usr/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 1062, in execute_sql
cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
File "/usr/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
cursor.execute(sql, params)
File "/usr/lib/python2.7/site-packages/django/db/backends/utils.py", line 79, in execute
return super(CursorDebugWrapper, self).execute(sql, params)
File "/usr/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/usr/lib/python2.7/site-packages/django/db/utils.py", line 98, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/usr/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
django.db.utils.OperationalError: SSL SYSCALL error: EOF detected
I think you should check your postgres settings. Look for any of the following being set:
statement_timeout idle_in_transaction_session_timeout lock_timeout
Are you using anything like pgbouncer?
Postgres is normally capable of keeping transactions open for months at a time.
On 30 November 2017 at 22:38, Dustin Broderick notifications@github.com wrote:
I was not intending to use a partially indexed database for search. I tried indexing the my database with and without the transaction and it failed with a OperationalError: SSL SYSCALL error if I had the transaction enabled (trace included below). It successfully indexed once I disable the transaction.
Error trace with the transaction:
Traceback (most recent call last): File "./manage.py", line 14, in
execute_from_command_line(sys.argv) File "/usr/lib/python2.7/site-packages/django/core/management/init.py", line 354, in execute_from_command_line utility.execute() File "/usr/lib/python2.7/site-packages/django/core/management/init.py", line 346, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/usr/lib/python2.7/site-packages/django/core/management/base.py", line 394, in run_from_argv self.execute(*args, cmd_options) File "/usr/lib/python2.7/site-packages/django/core/management/base.py", line 445, in execute output = self.handle(*args, *options) File "/usr/lib/python2.7/site-packages/django/utils/decorators.py", line 145, in inner return func(args, kwargs) File "/usr/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 142, in handle refreshed_model_count += rebuild_index_for_model(model, engine_slug, verbosity) File "/usr/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 61, in rebuild_index_for_model _bulk_save_search_entries(iter_search_entries()) File "/usr/lib/python2.7/site-packages/watson/search.py", line 206, in _bulk_save_search_entries search_entry_batch = list(islice(search_entries, 0, batch_size)) File "/usr/lib/python2.7/site-packages/watson/management/commands/buildwatson.py", line 40, in iter_search_entries for search_entry in searchengine._update_obj_index_iter(obj): File "/usr/lib/python2.7/site-packages/watson/search.py", line 491, in _update_obj_index_iter update_count = search_entries.update(**search_entry_data) File "/usr/lib/python2.7/site-packages/django/db/models/query.py", line 563, in update rows = query.get_compiler(self.db).execute_sql(CURSOR) File "/usr/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 1062, in execute_sql cursor = super(SQLUpdateCompiler, self).execute_sql(result_type) File "/usr/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 840, in execute_sql cursor.execute(sql, params) File "/usr/lib/python2.7/site-packages/django/db/backends/utils.py", line 79, in execute return super(CursorDebugWrapper, self).execute(sql, params) File "/usr/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) File "/usr/lib/python2.7/site-packages/django/db/utils.py", line 98, in exit six.reraise(dj_exc_type, dj_exc_value, traceback) File "/usr/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) django.db.utils.OperationalError: SSL SYSCALL error: EOF detected — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/issues/225#issuecomment-348343980, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJFCK3rVOgKTwl4I8T7Acy8h38m78eKks5s7y5cgaJpZM4Qtvql .
I looked at my PG configuration and statement_timeout, idle_in_transaction_session_timeout, and lock_timeout are all set to 0. I was using a direct connection to the database.
I looked at the usage statistic for the indexing processes while indexing and the RAM usage was steadily ticking up. I didn't get a chance to run until failure, but I suspect that the DB server might be running out of memory and killing the connection.
I created PR #229 which adds the option for a non-atomic buildwatson
. I see the value of indexing atomically, but I was only able to get my database to successfully finish after disabling atomicity.
The postgres docs seem to indicate that this doesn't happen...
...however, it seems to be happening to you, as well as various other people on the internet! 😛
...and I've observed similar behavior myself in the past, but assumed I was doing something dumb.
So I've merged your pull request. Thanks for taking the time to argue your position, and convince me!
I am trying to make a large table (>1 million objects) searchable; however, the
buildwatson
management command times out and no SearchEntries are created.It looks like Watson is configured to add entries in batches of 100 and
_build_save_search_entries
is iterating that way, but Postgres tries to add these in one über transaction. If I open a Django shell and runrebuild_index_for_model
manually, it works as expected adding entries over time.I'm suspecting this is an issue with Django forcing management commands to use a single DB transaction, but I'm not sure how to override this.
I'm using django-watson 1.4.2 on Django 1.8.18. I'm running on macOS High Sierra and Moby Linux