Cadasta / cadasta-platform

[DEPRECATED] Main repository of the Cadasta platform. Technology to help communities document their land rights around the world.
https://demo.cadasta.org
GNU Affero General Public License v3.0
53 stars 81 forks source link

Reset DB connections if connection is lost #2041

Closed alukach closed 6 years ago

alukach commented 6 years ago

Proposed changes in this pull request

Why I made this change

If our task consumer loses connection to the DB, it will continually try and fail to use that same connection, throwing either a OperationalError (the first time) or InterfaceError (every following time) when attempting to save a BackgroundTask to the database.

Description of the change

When a failed DB connection is caught, Django's close_old_connections() function is called which automatically closes any bad connection. When our system then tries to use the connection, a new connection is opened. The message is never acknowledged when a bad connection occurs, ensuring that it will be put back in the queue for processing after the SQS visibility timeout expires.

How someone else can test the change

I think the tests are clear enough regarding out handling of the errors. To replicate the error itself, open your Django shell and in another terminal, kill all DB connections:

sudo -u postgres psql -c "SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'cadasta' AND pid <> pg_backend_pid();"

You'll see that any attempt to query the DB raises the said errors. Calling the close_old_connections() then rectifies the issue. Example:

(env)vagrant@vagrant-ubuntu-trusty-64:/opt/cadasta/cadasta-platform/cadasta$ NODB=1 DJANGO_SETTINGS_MODULE=config.settings.dev_debug ./manage.py shell_plus
/opt/cadasta/env/lib/python3.5/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" i
nstead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.  """)
# Shell Plus Model Imports
from accounts.models import HistoricalUser, User, VerificationDevice
from allauth.socialaccount.models import SocialAccount, SocialApp, SocialTokenfrom rest_framework.authtoken.models import Token
from xforms.models import XFormSubmission
from geography.models import WorldBorder
from django.contrib.auth.models import Group, Permission
from party.models import HistoricalParty, HistoricalPartyRelationship, HistoricalTenureRelationship, Party, PartyRelationship, TenureRelationship
from allauth.account.models import EmailAddress, EmailConfirmation
from django.contrib.contenttypes.models import ContentTypefrom tasks.models import BackgroundTask, TaskResult
from resources.models import ContentObject, HistoricalContentObject, HistoricalResource, Resource, SpatialResourcefrom organization.models import HistoricalOrganization, HistoricalOrganizationRole, HistoricalProject, HistoricalProjectRole, Organization, OrganizationRole, Project, ProjectRole
Failed to import 'PolicyAuditLogEntry' from 'tutelary.models' reason: Module "tutelary.models" does not define a "PolicyAuditLogEntry" attribute/class
Failed to import 'RoleAuditLogEntry' from 'tutelary.models' reason: Module "tutelary.models" does not define a "RoleAuditLogEntry" attribute/class
from tutelary.models import PermissionSet, Policy, PolicyInstance, Role, RolePolicyAssign
from django.contrib.sessions.models import Session
from django.contrib.sites.models import Site
from spatial.models import HistoricalSpatialRelationship, HistoricalSpatialUnit, SpatialRelationship, SpatialUnit
from jsonattrs.models import Attribute, AttributeType, Schema
from questionnaires.models import HistoricalQuestion, HistoricalQuestionGroup, HistoricalQuestionOption, HistoricalQuestionnaire, Question, QuestionGroup, QuestionOption, Questionnaire
# Shell Plus Django Imports
from django.conf import settings
from django.core.cache import cache
from django.db import transaction
from django.utils import timezone
from django.urls import reverse
from django.contrib.auth import get_user_model
from django.db.models import Avg, Case, Count, F, Max, Min, Prefetch, Q, Sum, When, Exists, OuterRef, Subquery
Python 3.5.3 (default, Apr 22 2017, 00:00:00)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: BackgroundTask.objects.all()
Out[1]: <QuerySet []>

[1]+  Stopped                 NODB=1 DJANGO_SETTINGS_MODULE=config.settings.dev_debug ./manage.py shell_plus
(env)vagrant@vagrant-ubuntu-trusty-64:/opt/cadasta/cadasta-platform/cadasta$ sudo -u postgres psql -c "SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'cadasta' AND pid <> pg_backend_pid();"
 pg_terminate_backend
----------------------
 t
(1 row)

(env)vagrant@vagrant-ubuntu-trusty-64:/opt/cadasta/cadasta-platform/cadasta$ fg
NODB=1 DJANGO_SETTINGS_MODULE=config.settings.dev_debug ./manage.py shell_plus
In [2]: BackgroundTask.objects.all()
Out[2]: ---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/utils.py in execute(self, sql, params)
     63             else:
---> 64                 return self.cursor.execute(sql, params)
     65

OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

The above exception was the direct cause of the following exception:

OperationalError                          Traceback (most recent call last)
/opt/cadasta/env/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    393                             if callable(meth):
    394                                 return meth(obj, self, cycle)
--> 395             return _default_pprint(obj, self, cycle)
    396         finally:
    397             self.end_group()

/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
    508     if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
    509         # A user-provided repr. Find newlines and replace them with p.break_()
--> 510         _repr_pprint(obj, p, cycle)
    511         return
    512     p.begin_group(1, '<')

/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    699     """A pprint that just redirects to the normal repr function."""
    700     # Find newlines and replace them with p.break_()
--> 701     output = repr(obj)
    702     for idx,output_line in enumerate(output.splitlines()):
    703         if idx:

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __repr__(self)
    224
    225     def __repr__(self):
--> 226         data = list(self[:REPR_OUTPUT_SIZE + 1])
    227         if len(data) > REPR_OUTPUT_SIZE:
    228             data[-1] = "...(remaining elements truncated)..."

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __iter__(self)
    248                - Responsible for turning the rows into model objects.
    249         """
--> 250         self._fetch_all()
    251         return iter(self._result_cache)
    252

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in _fetch_all(self)
   1116     def _fetch_all(self):
   1117         if self._result_cache is None:
-> 1118             self._result_cache = list(self._iterable_class(self))
   1119         if self._prefetch_related_lookups and not self._prefetch_done:
   1120             self._prefetch_related_objects()

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __iter__(self)
     51         # Execute the query. This will also fill compiler.select, klass_info,
     52         # and annotations.
---> 53         results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
     54         select, klass_info, annotation_col_map = (compiler.select, compiler.klass_info,
     55                                                   compiler.annotation_col_map)

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/sql/compiler.py in execute_sql(self, result_type, chunked_fetch)
    897                 # silencing when dropping Python 2 compatibility.
    898                 pass
--> 899             raise original_exception
    900
    901         if result_type == CURSOR:

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/sql/compiler.py in execute_sql(self, result_type, chunked_fetch)
    887             cursor = self.connection.cursor()
    888         try:
--> 889             cursor.execute(sql, params)
    890         except Exception as original_exception:
    891             try:

/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/utils.py in execute(self, sql, params)
     77         start = time()
     78         try:
---> 79             return super(CursorDebugWrapper, self).execute(sql, params)
     80         finally:
     81             stop = time()

/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/utils.py in execute(self, sql, params)
     62                 return self.cursor.execute(sql)
     63             else:
---> 64                 return self.cursor.execute(sql, params)
     65
     66     def executemany(self, sql, param_list):

/opt/cadasta/env/lib/python3.5/site-packages/django/db/utils.py in __exit__(self, exc_type, exc_value, traceback)
     92                 if dj_exc_type not in (DataError, IntegrityError):
     93                     self.wrapper.errors_occurred = True
---> 94                 six.reraise(dj_exc_type, dj_exc_value, traceback)
     95
     96     def __call__(self, func):

/opt/cadasta/env/lib/python3.5/site-packages/django/utils/six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value
    687

/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/utils.py in execute(self, sql, params)
     62                 return self.cursor.execute(sql)
     63             else:
---> 64                 return self.cursor.execute(sql, params)
     65
     66     def executemany(self, sql, param_list):

OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

In [3]: BackgroundTask.objects.all()
Out[3]: ---------------------------------------------------------------------------
InterfaceError                            Traceback (most recent call last)
/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/base/base.py in _cursor(self, name)
    230         with self.wrap_database_errors:
--> 231             return self._prepare_cursor(self.create_cursor(name))
    232

/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/postgresql/base.py in create_cursor(self, name)
    219         else:
--> 220             cursor = self.connection.cursor()
    221         cursor.tzinfo_factory = utc_tzinfo_factory if settings.USE_TZ else None

InterfaceError: connection already closed

The above exception was the direct cause of the following exception:

InterfaceError                            Traceback (most recent call last)
/opt/cadasta/env/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    393                             if callable(meth):
    394                                 return meth(obj, self, cycle)
--> 395             return _default_pprint(obj, self, cycle)
    396         finally:
    397             self.end_group()

/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
    508     if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
    509         # A user-provided repr. Find newlines and replace them with p.break_()
--> 510         _repr_pprint(obj, p, cycle)
    511         return
    512     p.begin_group(1, '<')

/opt/cadasta/env/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    699     """A pprint that just redirects to the normal repr function."""
    700     # Find newlines and replace them with p.break_()
--> 701     output = repr(obj)
    702     for idx,output_line in enumerate(output.splitlines()):
    703         if idx:

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __repr__(self)
    224
    225     def __repr__(self):
--> 226         data = list(self[:REPR_OUTPUT_SIZE + 1])
    227         if len(data) > REPR_OUTPUT_SIZE:
    228             data[-1] = "...(remaining elements truncated)..."

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __iter__(self)
    248                - Responsible for turning the rows into model objects.
    249         """
--> 250         self._fetch_all()
    251         return iter(self._result_cache)
    252

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in _fetch_all(self)
   1116     def _fetch_all(self):
   1117         if self._result_cache is None:
-> 1118             self._result_cache = list(self._iterable_class(self))
   1119         if self._prefetch_related_lookups and not self._prefetch_done:
   1120             self._prefetch_related_objects()

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/query.py in __iter__(self)
     51         # Execute the query. This will also fill compiler.select, klass_info,
     52         # and annotations.
---> 53         results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
     54         select, klass_info, annotation_col_map = (compiler.select, compiler.klass_info,
     55                                                   compiler.annotation_col_map)

/opt/cadasta/env/lib/python3.5/site-packages/django/db/models/sql/compiler.py in execute_sql(self, result_type, chunked_fetch)
    885             cursor = self.connection.chunked_cursor()
    886         else:
--> 887             cursor = self.connection.cursor()
    888         try:
    889             cursor.execute(sql, params)

/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/base/base.py in cursor(self)
    252         Creates a cursor, opening a connection if necessary.
    253         """
--> 254         return self._cursor()
    255
    256     def commit(self):

/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/base/base.py in _cursor(self, name)
    229         self.ensure_connection()
    230         with self.wrap_database_errors:
--> 231             return self._prepare_cursor(self.create_cursor(name))
    232
    233     def _commit(self):

/opt/cadasta/env/lib/python3.5/site-packages/django/db/utils.py in __exit__(self, exc_type, exc_value, traceback)
     92                 if dj_exc_type not in (DataError, IntegrityError):
     93                     self.wrapper.errors_occurred = True
---> 94                 six.reraise(dj_exc_type, dj_exc_value, traceback)
     95
     96     def __call__(self, func):

/opt/cadasta/env/lib/python3.5/site-packages/django/utils/six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value
    687

/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/base/base.py in _cursor(self, name)
    229         self.ensure_connection()
    230         with self.wrap_database_errors:
--> 231             return self._prepare_cursor(self.create_cursor(name))
    232
    233     def _commit(self):

/opt/cadasta/env/lib/python3.5/site-packages/django/db/backends/postgresql/base.py in create_cursor(self, name)
    218             cursor = self.connection.cursor(name, scrollable=False, withhold=self.connection.autocommit)
    219         else:
--> 220             cursor = self.connection.cursor()
    221         cursor.tzinfo_factory = utc_tzinfo_factory if settings.USE_TZ else None
    222         return cursor

InterfaceError: connection already closed

In [4]: from django.db import close_old_connections

In [5]: close_old_connections()

In [6]: BackgroundTask.objects.all()
Out[6]: DEBUG:django.db.backends:(0.003) SELECT "tasks_backgroundtask"."id", "tasks_backgroundtask"."task_id", "tasks_backgroundtask"."type", "tasks_backgroundtask"."created_date", "tasks_backgroundtask"."last_updated", "tasks_backgroundtask"."creator_id", "tasks_backgroundtask"."input", "tasks_backgroundtask"."options", "tasks_backgroundtask"."related_content_type_id", "tasks_backgroundtask"."related_object_id", "tasks_backgroundtask"."parent_id", "tasks_backgroundtask"."root_id", "tasks_backgroundtask"."immutable" FROM "tasks_backgroundtask" ORDER BY "tasks_backgroundtask"."created_date" ASC LIMIT 21; args=()
<QuerySet []>

When should this PR be merged

Before end of sprint.

Risks

None foreseen, calling the DB connection cleanup command seems pretty inocuous.

Follow-up actions

[List any possible follow-up actions here; for instance, testing data migrations, software that we need to install on staging and production environments.]

Checklist (for reviewing)

General

Is this PR explained thoroughly? All code changes must be accounted for in the PR description.

Is the PR labeled correctly? It should have the migration label if a new migration is added.

Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.

Functionality

Are all requirements met? Compare implemented functionality with the requirements specification.

Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.

Code

Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.

Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.

Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.

Is the code documented sufficiently? Large and complex classes, functions or methods must be annotated with comments following our code-style guidelines.

Has the scalability of this change been evaluated?

Is there a maintenance plan in place?

Tests

Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.

If this is a bug fix, are tests for the issue in place? There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.

If this is a new feature or a significant change to an existing feature? has the manual testing spreadsheet been updated with instructions for manual testing?

Security

Confirm this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets.

Are all UI and API inputs run through forms or serializers?

Are all external inputs validated and sanitized appropriately?

Does all branching logic have a default case?

Does this solution handle outliers and edge cases gracefully?

Are all external communications secured and restricted to SSL?

Documentation

Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.

Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.

Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.