ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
14.01k stars 3.42k forks source link

/api/v2/job_events/1234/ throws 500 error #14609

Open jplitza opened 12 months ago

jplitza commented 12 months ago

Please confirm the following

Bug Summary

Fetching any single job_event from the API throws a 500 Internal Server Error. To be precise:

{"detail":"A server error has occurred."}

AWX version

23.3.1

Select the relevant components

Installation method

kubernetes

Modifications

no

Ansible version

No response

Operating system

No response

Web browser

No response

Steps to reproduce

Fetch any single job_event from the API.

Expected results

Be provided with a response as it is contained e.g. in the /api/v2/jobs/1234/job_events/ endpoint (which actually contains a link to this very API URL for every single job event).

Actual results

{"detail":"A server error has occurred."}

Additional information

Log from the awx-web container:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
    return view_func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/views/generic/base.py", line 104, in view
    return self.dispatch(request, *args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/generics.py", line 332, in dispatch
    return super(APIView, self).dispatch(request, *args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/generics.py", line 208, in get
    return self.retrieve(request, *args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/mixins.py", line 54, in retrieve
    instance = self.get_object()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/rest_framework/generics.py", line 83, in get_object
    queryset = self.filter_queryset(self.get_queryset())
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/generics.py", line 365, in get_queryset
    qs = self.model._default_manager
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/views/__init__.py", line 3545, in model
    if self.is_partitioned:
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/views/__init__.py", line 3541, in is_partitioned
    return int(self.kwargs['pk']) > unpartitioned_event_horizon(models.JobEvent)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/api/views/__init__.py", line 133, in unpartitioned_event_horizon
    cursor.execute(f'SELECT MAX(id) FROM _unpartitioned_{cls._meta.db_table}')
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
django.db.utils.InternalError: current transaction is aborted, commands ignored until end of transaction block

The Postgres log contains the real error:

LOG:  execute <unnamed>: BEGIN
LOG:  statement: SELECT MAX(id) FROM _unpartitioned_main_jobevent
ERROR:  relation "_unpartitioned_main_jobevent" does not exist at character 21
STATEMENT:  SELECT MAX(id) FROM _unpartitioned_main_jobevent
LOG:  statement: SELECT MAX(id) FROM _unpartitioned_main_jobevent
ERROR:  current transaction is aborted, commands ignored until end of transaction block
STATEMENT:  SELECT MAX(id) FROM _unpartitioned_main_jobevent
LOG:  execute <unnamed>: ROLLBACK
jplitza commented 12 months ago

I guess this is caused by unpartitioned_event_horizon(): https://github.com/ansible/awx/blob/fc0b58fd0472b732f36702431896d6d00d78dd19/awx/api/views/__init__.py#L130-L136 Maybe the error that Django throws here changed and the except ProgrammingError doesn't catch that anymore?

Or rather, Django seems to internally repeat the query (see extended Postgres log). The first try might have resulted in a ProgrammingError, but the second results in a different reaction from Postgres (because the transaction has to be rolled back first) and hence throws a different exception.

jplitza commented 12 months ago

Actually, the catching of the ProgrammingError works as intended. The only problem is that at some earlier point, a transaction was started, which isn't rolled back when the statement fails.

Inserting a transaction.rollback() in the except block doesn't work either, because:

django.db.transaction.TransactionManagementError: This is forbidden when an 'atomic' block is active.

So as a workaround, I now first check if the table exists:

def unpartitioned_event_horizon(cls):
    if f'_unpartitioned_{cls._meta.db_table}' not in connection.introspection.table_names():
        return 0
    with connection.cursor() as cursor:
        cursor.execute(f'SELECT MAX(id) FROM _unpartitioned_{cls._meta.db_table}')
        return cursor.fetchone()[0] or -1

I don't know whether this actually works when the unpartitioned table is present, but it works for me. :wink:

fosterseth commented 11 months ago

@AlanCoding is suggestion we modify the select statement to do the table introspection

e.g.

select case when EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'params')
then (select par_val from params where par_name='DBALIAS') else (select 'NOTABLE') end as DBAlias;
AlanCoding commented 11 months ago

This is a high priority item, I will try to look at it today.