If our task consumer loses connection to the DB, it will continually try and fail to use that same connection, throwing either a OperationalError (the first time) or InterfaceError (every following time) when attempting to save a BackgroundTask to the database.
Description of the change
When a failed DB connection is caught, Django's close_old_connections() function is called which automatically closes any bad connection. When our system then tries to use the connection, a new connection is opened. The message is never acknowledged when a bad connection occurs, ensuring that it will be put back in the queue for processing after the SQS visibility timeout expires.
How someone else can test the change
I think the tests are clear enough regarding out handling of the errors. To replicate the error itself, open your Django shell and in another terminal, kill all DB connections:
sudo -u postgres psql -c "SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'cadasta' AND pid <> pg_backend_pid();"
You'll see that any attempt to query the DB raises the said errors. Calling the close_old_connections() then rectifies the issue. Example:
(env)vagrant@vagrant-ubuntu-trusty-64:/opt/cadasta/cadasta-platform/cadasta$ NODB=1 DJANGO_SETTINGS_MODULE=config.settings.dev_debug ./manage.py shell_plus
/opt/cadasta/env/lib/python3.5/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" i
nstead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>. """)
# Shell Plus Model Imports
from accounts.models import HistoricalUser, User, VerificationDevice
from allauth.socialaccount.models import SocialAccount, SocialApp, SocialTokenfrom rest_framework.authtoken.models import Token
from xforms.models import XFormSubmission
from geography.models import WorldBorder
from django.contrib.auth.models import Group, Permission
from party.models import HistoricalParty, HistoricalPartyRelationship, HistoricalTenureRelationship, Party, PartyRelationship, TenureRelationship
from allauth.account.models import EmailAddress, EmailConfirmation
from django.contrib.contenttypes.models import ContentTypefrom tasks.models import BackgroundTask, TaskResult
from resources.models import ContentObject, HistoricalContentObject, HistoricalResource, Resource, SpatialResourcefrom organization.models import HistoricalOrganization, HistoricalOrganizationRole, HistoricalProject, HistoricalProjectRole, Organization, OrganizationRole, Project, ProjectRole
Failed to import 'PolicyAuditLogEntry' from 'tutelary.models' reason: Module "tutelary.models" does not define a "PolicyAuditLogEntry" attribute/class
Failed to import 'RoleAuditLogEntry' from 'tutelary.models' reason: Module "tutelary.models" does not define a "RoleAuditLogEntry" attribute/class
from tutelary.models import PermissionSet, Policy, PolicyInstance, Role, RolePolicyAssign
from django.contrib.sessions.models import Session
from django.contrib.sites.models import Site
from spatial.models import HistoricalSpatialRelationship, HistoricalSpatialUnit, SpatialRelationship, SpatialUnit
from jsonattrs.models import Attribute, AttributeType, Schema
from questionnaires.models import HistoricalQuestion, HistoricalQuestionGroup, HistoricalQuestionOption, HistoricalQuestionnaire, Question, QuestionGroup, QuestionOption, Questionnaire
# Shell Plus Django Imports
from django.conf import settings
from django.core.cache import cache
from django.db import transaction
from django.utils import timezone
from django.urls import reverse
from django.contrib.auth import get_user_model
from django.db.models import Avg, Case, Count, F, Max, Min, Prefetch, Q, Sum, When, Exists, OuterRef, Subquery
Python 3.5.3 (default, Apr 22 2017, 00:00:00)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: BackgroundTask.objects.all()
Out[1]: <QuerySet []>
[1]+ Stopped NODB=1 DJANGO_SETTINGS_MODULE=config.settings.dev_debug ./manage.py shell_plus
(env)vagrant@vagrant-ubuntu-trusty-64:/opt/cadasta/cadasta-platform/cadasta$ sudo -u postgres psql -c "SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'cadasta' AND pid <> pg_backend_pid();"
pg_terminate_backend
----------------------
t
(1 row)
(env)vagrant@vagrant-ubuntu-trusty-64:/opt/cadasta/cadasta-platform/cadasta$ fg
NODB=1 DJANGO_SETTINGS_MODULE=config.settings.dev_debug ./manage.py shell_plus
In [2]: BackgroundTask.objects.all()
Out[2]: ---------------------------------------------------------------------------
OperationalError Traceback (most recent call last)
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/utils.py in execute(self, sql, params)
63 else:
---> 64 return self.cursor.execute(sql, params)
65
OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The above exception was the direct cause of the following exception:
OperationalError Traceback (most recent call last)
/opt/cadasta/env/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()
/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj)
393 if callable(meth):
394 return meth(obj, self, cycle)
--> 395 return _default_pprint(obj, self, cycle)
396 finally:
397 self.end_group()
/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
508 if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
509 # A user-provided repr. Find newlines and replace them with p.break_()
--> 510 _repr_pprint(obj, p, cycle)
511 return
512 p.begin_group(1, '<')
/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
699 """A pprint that just redirects to the normal repr function."""
700 # Find newlines and replace them with p.break_()
--> 701 output = repr(obj)
702 for idx,output_line in enumerate(output.splitlines()):
703 if idx:
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __repr__(self)
224
225 def __repr__(self):
--> 226 data = list(self[:REPR_OUTPUT_SIZE + 1])
227 if len(data) > REPR_OUTPUT_SIZE:
228 data[-1] = "...(remaining elements truncated)..."
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __iter__(self)
248 - Responsible for turning the rows into model objects.
249 """
--> 250 self._fetch_all()
251 return iter(self._result_cache)
252
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in _fetch_all(self)
1116 def _fetch_all(self):
1117 if self._result_cache is None:
-> 1118 self._result_cache = list(self._iterable_class(self))
1119 if self._prefetch_related_lookups and not self._prefetch_done:
1120 self._prefetch_related_objects()
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __iter__(self)
51 # Execute the query. This will also fill compiler.select, klass_info,
52 # and annotations.
---> 53 results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
54 select, klass_info, annotation_col_map = (compiler.select, compiler.klass_info,
55 compiler.annotation_col_map)
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/sql/compiler.py in execute_sql(self, result_type, chunked_fetch)
897 # silencing when dropping Python 2 compatibility.
898 pass
--> 899 raise original_exception
900
901 if result_type == CURSOR:
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/sql/compiler.py in execute_sql(self, result_type, chunked_fetch)
887 cursor = self.connection.cursor()
888 try:
--> 889 cursor.execute(sql, params)
890 except Exception as original_exception:
891 try:
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/utils.py in execute(self, sql, params)
77 start = time()
78 try:
---> 79 return super(CursorDebugWrapper, self).execute(sql, params)
80 finally:
81 stop = time()
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/utils.py in execute(self, sql, params)
62 return self.cursor.execute(sql)
63 else:
---> 64 return self.cursor.execute(sql, params)
65
66 def executemany(self, sql, param_list):
/opt/cadasta/env/lib/python3.5/site-packages/django/db/utils.py in __exit__(self, exc_type, exc_value, traceback)
92 if dj_exc_type not in (DataError, IntegrityError):
93 self.wrapper.errors_occurred = True
---> 94 six.reraise(dj_exc_type, dj_exc_value, traceback)
95
96 def __call__(self, func):
/opt/cadasta/env/lib/python3.5/site-packages/django/utils/six.py in reraise(tp, value, tb)
683 value = tp()
684 if value.__traceback__ is not tb:
--> 685 raise value.with_traceback(tb)
686 raise value
687
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/utils.py in execute(self, sql, params)
62 return self.cursor.execute(sql)
63 else:
---> 64 return self.cursor.execute(sql, params)
65
66 def executemany(self, sql, param_list):
OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
In [3]: BackgroundTask.objects.all()
Out[3]: ---------------------------------------------------------------------------
InterfaceError Traceback (most recent call last)
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/base/base.py in _cursor(self, name)
230 with self.wrap_database_errors:
--> 231 return self._prepare_cursor(self.create_cursor(name))
232
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/postgresql/base.py in create_cursor(self, name)
219 else:
--> 220 cursor = self.connection.cursor()
221 cursor.tzinfo_factory = utc_tzinfo_factory if settings.USE_TZ else None
InterfaceError: connection already closed
The above exception was the direct cause of the following exception:
InterfaceError Traceback (most recent call last)
/opt/cadasta/env/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()
/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj)
393 if callable(meth):
394 return meth(obj, self, cycle)
--> 395 return _default_pprint(obj, self, cycle)
396 finally:
397 self.end_group()
/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
508 if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
509 # A user-provided repr. Find newlines and replace them with p.break_()
--> 510 _repr_pprint(obj, p, cycle)
511 return
512 p.begin_group(1, '<')
/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
699 """A pprint that just redirects to the normal repr function."""
700 # Find newlines and replace them with p.break_()
--> 701 output = repr(obj)
702 for idx,output_line in enumerate(output.splitlines()):
703 if idx:
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __repr__(self)
224
225 def __repr__(self):
--> 226 data = list(self[:REPR_OUTPUT_SIZE + 1])
227 if len(data) > REPR_OUTPUT_SIZE:
228 data[-1] = "...(remaining elements truncated)..."
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __iter__(self)
248 - Responsible for turning the rows into model objects.
249 """
--> 250 self._fetch_all()
251 return iter(self._result_cache)
252
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in _fetch_all(self)
1116 def _fetch_all(self):
1117 if self._result_cache is None:
-> 1118 self._result_cache = list(self._iterable_class(self))
1119 if self._prefetch_related_lookups and not self._prefetch_done:
1120 self._prefetch_related_objects()
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __iter__(self)
51 # Execute the query. This will also fill compiler.select, klass_info,
52 # and annotations.
---> 53 results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
54 select, klass_info, annotation_col_map = (compiler.select, compiler.klass_info,
55 compiler.annotation_col_map)
/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/sql/compiler.py in execute_sql(self, result_type, chunked_fetch)
885 cursor = self.connection.chunked_cursor()
886 else:
--> 887 cursor = self.connection.cursor()
888 try:
889 cursor.execute(sql, params)
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/base/base.py in cursor(self)
252 Creates a cursor, opening a connection if necessary.
253 """
--> 254 return self._cursor()
255
256 def commit(self):
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/base/base.py in _cursor(self, name)
229 self.ensure_connection()
230 with self.wrap_database_errors:
--> 231 return self._prepare_cursor(self.create_cursor(name))
232
233 def _commit(self):
/opt/cadasta/env/lib/python3.5/site-packages/django/db/utils.py in __exit__(self, exc_type, exc_value, traceback)
92 if dj_exc_type not in (DataError, IntegrityError):
93 self.wrapper.errors_occurred = True
---> 94 six.reraise(dj_exc_type, dj_exc_value, traceback)
95
96 def __call__(self, func):
/opt/cadasta/env/lib/python3.5/site-packages/django/utils/six.py in reraise(tp, value, tb)
683 value = tp()
684 if value.__traceback__ is not tb:
--> 685 raise value.with_traceback(tb)
686 raise value
687
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/base/base.py in _cursor(self, name)
229 self.ensure_connection()
230 with self.wrap_database_errors:
--> 231 return self._prepare_cursor(self.create_cursor(name))
232
233 def _commit(self):
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/postgresql/base.py in create_cursor(self, name)
218 cursor = self.connection.cursor(name, scrollable=False, withhold=self.connection.autocommit)
219 else:
--> 220 cursor = self.connection.cursor()
221 cursor.tzinfo_factory = utc_tzinfo_factory if settings.USE_TZ else None
222 return cursor
InterfaceError: connection already closed
In [4]: from django.db import close_old_connections
In [5]: close_old_connections()
In [6]: BackgroundTask.objects.all()
Out[6]: DEBUG:django.db.backends:(0.003) SELECT "tasks_backgroundtask"."id", "tasks_backgroundtask"."task_id", "tasks_backgroundtask"."type", "tasks_backgroundtask"."created_date", "tasks_backgroundtask"."last_updated", "tasks_backgroundtask"."creator_id", "tasks_backgroundtask"."input", "tasks_backgroundtask"."options", "tasks_backgroundtask"."related_content_type_id", "tasks_backgroundtask"."related_object_id", "tasks_backgroundtask"."parent_id", "tasks_backgroundtask"."root_id", "tasks_backgroundtask"."immutable" FROM "tasks_backgroundtask" ORDER BY "tasks_backgroundtask"."created_date" ASC LIMIT 21; args=()
<QuerySet []>
When should this PR be merged
Before end of sprint.
Risks
None foreseen, calling the DB connection cleanup command seems pretty inocuous.
Follow-up actions
[List any possible follow-up actions here; for instance, testing data
migrations, software that we need to install on staging and production
environments.]
Checklist (for reviewing)
General
Is this PR explained thoroughly? All code changes must be accounted for in the PR description.
[ ] Review 1
[ ] Review 2
Is the PR labeled correctly? It should have the migration label if a new migration is added.
[ ] Review 1
[ ] Review 2
Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.
[ ] Review 1
[ ] Review 2
Functionality
Are all requirements met? Compare implemented functionality with the requirements specification.
[ ] Review 1
[ ] Review 2
Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.
[ ] Review 1
[ ] Review 2
Code
Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.
[ ] Review 1
[ ] Review 2
Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.
[ ] Review 1
[ ] Review 2
Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.
[ ] Review 1
[ ] Review 2
Is the code documented sufficiently? Large and complex classes, functions or methods must be annotated with comments following our code-style guidelines.
[ ] Review 1
[ ] Review 2
Has the scalability of this change been evaluated?
[ ] Review 1
[ ] Review 2
Is there a maintenance plan in place?
[ ] Review 1
[ ] Review 2
Tests
Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.
[ ] Review 1
[ ] Review 2
If this is a bug fix, are tests for the issue in place? There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.
[ ] Review 1
[ ] Review 2
If this is a new feature or a significant change to an existing feature? has the manual testing spreadsheet been updated with instructions for manual testing?
[ ] Review 1
[ ] Review 2
Security
Confirm this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets.
[ ] Review 1
[ ] Review 2
Are all UI and API inputs run through forms or serializers?
[ ] Review 1
[ ] Review 2
Are all external inputs validated and sanitized appropriately?
[ ] Review 1
[ ] Review 2
Does all branching logic have a default case?
[ ] Review 1
[ ] Review 2
Does this solution handle outliers and edge cases gracefully?
[ ] Review 1
[ ] Review 2
Are all external communications secured and restricted to SSL?
[ ] Review 1
[ ] Review 2
Documentation
Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.
[ ] Review 1
[ ] Review 2
Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.
[ ] Review 1
[ ] Review 2
Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.
Proposed changes in this pull request
Why I made this change
If our task consumer loses connection to the DB, it will continually try and fail to use that same connection, throwing either a
OperationalError
(the first time) orInterfaceError
(every following time) when attempting to save aBackgroundTask
to the database.Description of the change
When a failed DB connection is caught, Django's
close_old_connections()
function is called which automatically closes any bad connection. When our system then tries to use the connection, a new connection is opened. The message is never acknowledged when a bad connection occurs, ensuring that it will be put back in the queue for processing after the SQS visibility timeout expires.How someone else can test the change
I think the tests are clear enough regarding out handling of the errors. To replicate the error itself, open your Django shell and in another terminal, kill all DB connections:
You'll see that any attempt to query the DB raises the said errors. Calling the
close_old_connections()
then rectifies the issue. Example:When should this PR be merged
Before end of sprint.
Risks
None foreseen, calling the DB connection cleanup command seems pretty inocuous.
Follow-up actions
[List any possible follow-up actions here; for instance, testing data migrations, software that we need to install on staging and production environments.]
Checklist (for reviewing)
General
Is this PR explained thoroughly? All code changes must be accounted for in the PR description.
Is the PR labeled correctly? It should have the
migration
label if a new migration is added.Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.
Functionality
Are all requirements met? Compare implemented functionality with the requirements specification.
Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.
Code
Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.
Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.
Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.
Is the code documented sufficiently? Large and complex classes, functions or methods must be annotated with comments following our code-style guidelines.
Has the scalability of this change been evaluated?
Is there a maintenance plan in place?
Tests
Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.
If this is a bug fix, are tests for the issue in place? There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.
If this is a new feature or a significant change to an existing feature? has the manual testing spreadsheet been updated with instructions for manual testing?
Security
Confirm this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets.
Are all UI and API inputs run through forms or serializers?
Are all external inputs validated and sanitized appropriately?
Does all branching logic have a default case?
Does this solution handle outliers and edge cases gracefully?
Are all external communications secured and restricted to SSL?
Documentation
Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.
Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.
Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.