codecov / self-hosted

Example of how to setup Codecov with docker compose
Other
433 stars 34 forks source link

Database inconsistency between worker and api image for 2024.5.1 release? #40

Open trevjonez opened 4 months ago

trevjonez commented 4 months ago

After updating my selfhosted instance and struggling thru getting migrations to run I am left with the following errors:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.UndefinedColumn: column feature_flags.rollout_identifier does not exist
LINE 1: ...e_flags"."platform", "feature_flags"."is_active", "feature_f...
                                                             ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/worker/tasks/base.py", line 277, in run
    return self.run_impl(db_session, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/worker/tasks/upload.py", line 302, in run_impl
    return self.run_impl_within_lock(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/worker/tasks/upload.py", line 489, in run_impl_within_lock
    commit_report = async_to_sync(report_service.initialize_and_save_report)(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 277, in __call__
    return call_result.result()
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 353, in main_wrap
    result = await self.awaitable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/worker/services/report/__init__.py", line 289, in initialize_and_save_report
    report = await self.create_new_report_for_commit(commit)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/worker/services/report/__init__.py", line 757, in create_new_report_for_commit
    await CARRYFORWARD_BASE_SEARCH_RANGE_BY_OWNER.check_value_async(
  File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 479, in __call__
    ret: _R = await loop.run_in_executor(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/asgiref/current_thread_executor.py", line 40, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 538, in thread_handler
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/shared/rollouts/__init__.py", line 158, in check_value_async
    return self.check_value(identifier, default)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/shared/rollouts/__init__.py", line 152, in check_value
    self._fetch_and_set_from_db()
  File "/usr/local/lib/python3.12/site-packages/cachetools/func.py", line 67, in wrapper
    v = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/shared/rollouts/__init__.py", line 233, in _fetch_and_set_from_db
    new_feature_flag = FeatureFlag.objects.filter(pk=self.name).first()
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/query.py", line 1057, in first
    for obj in queryset[:1]:
  File "/usr/local/lib/python3.12/site-packages/django/db/models/query.py", line 398, in __iter__
    self._fetch_all()
  File "/usr/local/lib/python3.12/site-packages/django/db/models/query.py", line 1881, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/query.py", line 91, in __iter__
    results = compiler.execute_sql(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/usr/local/lib/python3.12/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.ProgrammingError: column feature_flags.rollout_identifier does not exist
LINE 1: ...e_flags"."platform", "feature_flags"."is_active", "feature_f...
                                                             ^

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/celery/app/trace.py", line 477, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/celery/app/trace.py", line 760, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/worker/tasks/base.py", line 270, in run
    with TimeseriesTimer(
  File "/worker/helpers/telemetry.py", line 183, in __exit__
    self.metric_context.log_simple_metric(self.name, delta.total_seconds())
  File "/worker/helpers/telemetry.py", line 128, in log_simple_metric
    PgSimpleMetric.objects.create(
  File "/usr/local/lib/python3.12/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/query.py", line 658, in create
    obj.save(force_insert=True, using=self.db)
  File "/usr/local/lib/python3.12/site-packages/shared/django_apps/pg_telemetry/models.py", line 28, in save
    super().save(*args, **kwargs)
  File "/usr/local/lib/python3.12/site-packages/django/db/models/base.py", line 814, in save
    self.save_base(
  File "/usr/local/lib/python3.12/site-packages/django/db/models/base.py", line 877, in save_base
    updated = self._save_table(
              ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/base.py", line 1020, in _save_table
    results = self._do_insert(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/base.py", line 1061, in _do_insert
    return manager._insert(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/query.py", line 1805, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/usr/local/lib/python3.12/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.InternalError: current transaction is aborted, commands ignored until end of transaction block

After digging around for a bit I found this commit that renamed the column: https://github.com/codecov/shared/commit/d86d4667805723264205bda88f39f4a3ced2f45c#diff-552087a68d49f71285c998a00138656ec63de61039c6bbac698e935c4d9d40d1

Then looking at my own DB tables I am seeing the migration was ran.

image

But it seems that possibly the 2024.5.1 docker images for worker and api were built using different versions of shared? (worker repo isn't git tagged so I can really only guess what commit the image was built from) https://github.com/codecov/worker/blob/5c7c8927010514c2e53f1b14728a57de9be3d102/requirements.txt#L375

https://github.com/codecov/codecov-api/blob/self-hosted-24.5.1/requirements.txt#L405

trevjonez commented 4 months ago

I may try deploying against the latest rolling docker tag to see if that unblocks me.

But it seems the release process is potentially flawed in allowing things to go out without being in lockstep on critical shared code so I wanted to raise the question directly.

trevjonez commented 4 months ago

rolling version published to docker hub may 8th seems to be working and things seems to be flowing thru the system correctly again.

image
chaseconey commented 4 months ago

I am also seeing this behavior on my end. @trevjonez, what exactly did you end up pinning to?

trevjonez commented 4 months ago

I am also seeing this behavior on my end. @trevjonez, what exactly did you end up pinning to?

I used the rolling tag from the may 8th build. Pulled and retagged to suit my needs

chaseconey commented 4 months ago

Interesting. If I use 24.5.1 for everything and the latest rolling for the worker, I am getting the same issue.

trevjonez commented 4 months ago

Did you verify all your migrations got ran? I've had to do it manually from the api container every time I update. This round it had issues and I had to edit the migration table and run it again to get it to go thru then finally undo the edit. There was a ton of trial and error over about an hour so I can't say exactly the steps I took.

I believe it was some error about an out of order dependent migration.

chaseconey commented 4 months ago

I do see some odd messages popping from the api:

django.db.migrations.exceptions.InconsistentMigrationHistory: Migration user_measurements.0001_initial is applied before its dependency codecov_auth.0054_update_owners_column_defaults on database 'default'.

I am a little hesitant to start messing with migrations - 🤞 someone from the team can chime in 😬 .

trevjonez commented 4 months ago

yeah that was the one i hit as well. I believe what I did was renamed the thing to 0001_initial_temp. Then ran the migrations again to get a new error (something like can't modify the user-measurements because it had actually already ran the edit). put it back as 0001_initial then ran again.

chaseconey commented 4 months ago

What worked for me:

('user_measurements', '0001_initial');
nikolaik commented 4 months ago

What worked for me:

  • Delete latest migration entry in django_migrations table (user_measurements)
  • Re-run migrations

    • This should introduce a few new entries and then say something like "user_measurements already exists")
  • Re-add original errored entry at the end
('user_measurements', '0001_initial');

The image docker.io/codecov/self-hosted-worker:24.5 uses a version of codecov shared that seem to contain a commit before this change https://github.com/codecov/shared/pull/205/commits/d86d4667805723264205bda88f39f4a3ced2f45c

The worker is crashing when it runs an SQL query with a field that does not exist, though the migration (with the same name) for it has run. I'm assuming the migration with the same name was run by self-hosted-api, but with the correct migration content.

Looking at the worker image I notice the field name rollout_identifer instead of rollout_universe in the migration

docker run -it --entrypoint cat docker.io/codecov/self-hosted-worker:24.5.1 /usr/local/lib/python3.12/site-packages/shared/django_apps/rollouts/migrations/0005_featureflag_is_active_featureflag_platform_and_more.py

requirements.in contains a reference to the right commit for the shared lib though:

$ docker run -it --entrypoint cat docker.io/codecov/self-hosted-worker:24.5.1 /worker/requirements.in
https://github.com/codecov/shared/archive/148b7ae3a6d4cdfc554ba9ca8b911c13e82d77b8.tar.gz#egg=shared
<snip>

Is this a packaging issue in your image build?

nikolaik commented 4 months ago

Pinning at codecov/self-hosted-worker:rolling@sha256:e287e7e4557fdb1d79dbee1849aa2e84ee442d48b778df6b28ae16a5dcc94d48 seem to work well for now