banzaicloud / banzai-charts

Curated list of Banzai Cloud Helm charts used by the Pipeline Platform
Apache License 2.0
367 stars 278 forks source link

[Cadence Bug] missing setup schema version tables in schema setup job #1308

Open longquanzheng opened 2 years ago

longquanzheng commented 2 years ago

Describe the bug

https://github.com/banzaicloud/banzai-charts/blob/bc6a0cbfb4b58361865059df4f9b29a87a57a3c6/cadence/templates/server-job.yaml#L81

In the schema setup job, it creates a database only. this is not enough. The below step should also be executed

./cadence-sql-tool --ep $SQL_HOST_ADDR -p $port --plugin mysql --db cadence setup-schema -v 0.0 -- this sets up just the schema version tables with initial version of 0.0

See docs Steps to reproduce the issue: https://github.com/uber/cadence/blob/master/tools/sql/README.md

I haven’t run it myself, but the issue is reported by users

Expected behavior

The setup schema command should be run to create schema version tables Screenshots

Additional context

https://uber-cadence.slack.com/archives/CKBQ9LTTM/p1636700389022300?thread_ts=1636527868.004900&channel=CKBQ9LTTM&message_ts=1636700389.022300

longquanzheng commented 2 years ago

When trying to open a PR to fix that, I found that the command is existing: https://github.com/banzaicloud/banzai-charts/blob/bc6a0cbfb4b58361865059df4f9b29a87a57a3c6/cadence/templates/server-job.yaml#L135

It's in containers section instead of initContainers. Would that be a problem? @pregnor

pregnor commented 2 years ago

Thank you for your patience regarding the delay of this issue, I'm a bit overwhelmed lately on multiple fronts.

So if I understand everything correctly, there needs to be a database setup before the first Cadence usage and in case of schema version changes there also needs to be schema migration on said database (migration description provided by the Cadence server).


Kubernetes context description (from my point of view) to be on the same page and notice it if we aren't

There are switches to control this functionality in the chart (schema.setup.enabled, schema.update.enabled), turning it on/off.

We initiate 2 different one-time jobs to do these things.

The first one is for creating the database using the tool and also setting up the default schema using version v0.0, this is the one you have found above (here, line 1-174).

The second job does the schema migration from v0.0 to the latest available schema (same template, line 175-296).

As far as I understand the second job executes your sql tool to run through all the migration versions.

The reason it is designed this way is to provide different jobs for initial setup and migration-only (which may occur/be required on already existing database).

The jobs are actually "one-time-pods" to do a task once per initiation as opposed to running persistently doing things/handling stuff.


Answering the issue

I believe in case of the first job the schema setup "step" execution is rightly in the containers section, because it has to be executed once all the checks and the database creation is done.

Containers would be started concurrently during the Kubernetes execution inside a single job, so we must guard the dependency of the schema-setup step on the create-database step by making sure the create-database step is executed and finished first, before schema-setup would even start, otherwise it would be a race condition, because schema-setup would have no functioning/complete database to work on yet.

I also read the Slack conversation and I couldn't distill what is wrong with the chart exactly (whether it is the enablement flags or the split of the two jobs or the job execution order or the behavior during execution), I myself couldn't notice any unwanted or incorrect behavior of the chart based on the conversation (and judging by your last message you concluded the same) so if there is still something wrong with the charts, feel free to let me know, I don't want to dismiss any problems, I just can't quite get what is wrong with the current charts and their behavior if anything is.

pregnor commented 2 years ago

Hey @longquanzheng,

Is there any action to take on our side regarding the charts and this issue? (see my last comment above)