Apicurio / apicurio-registry

An API/Schema registry - stores APIs and Schemas.
https://www.apicur.io/registry/
Apache License 2.0
596 stars 265 forks source link

3.0.1 database upgrade failure #5356

Open wolfchimneyrock opened 2 days ago

wolfchimneyrock commented 2 days ago

Description

Registry Version: 3.0.0.M5 -> 3.0.1 upgrade Persistence type: postgresql

We perform database schema upgrades with a special 'bootstrap' instance of apicurio which has admin privileges to our postgres database (and can thus create tables, etc) whereas the client serving apicurio instances run with either read-only or read-write (but not db/table admin) access.

typically the bootstrap process starts an admin instance of apicurio, and then waits until /health/ready reports that the storage layer is ready, then stops the admin apicurio (assuming the database was successfully upgraded)

this worked for all of the 2.x upgrades we handled.

with 3.0.x upgrade from db version 100 to 101, it looks like the /health/ready endpoint reports the storage is available before the database schema change is finished - resulting in further failures to start apicurio 3.0.1. rolling back to 3.0.0M5 works fine, since the new tables are ignored.

Steps to Reproduce

start apicurio 3.0.1 with a db version 100 poll /health/ready until response code 200 returned terminate apicurio using SIGTERM

Expected vs Actual Behaviour

expected: database is fully upgraded before /health/ready reports 200 code. actual: database upgrade is still in progress when /health/ready reports 200.

Alternatively, the database upgrade could be idempotent and could resume a partial interrupted application.

Logs

WARN  [io.qua.run.con.DeprecatedRuntimePropertiesRecorder] (main) The 'quarkus.log.console.color' config property is deprecated and should not be used anymore
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main) SqlRegistryStorage constructed successfully.
INFO  [io.api.reg.sto.imp.sql.RegistryDatasourceProducer] (main) Using postgresql SQL storage.
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main) Checking to see if the DB is initialized.
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main) Database was already initialized, skipping.
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main) Checking to see if the DB is up-to-date.
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main) Build's DB version is 101
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main) Old database version detected, upgrading.
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main) Upgrading the Apicurio Hub API database.
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main)      Database type: postgresql
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main)      From Version:  100
INFO  [io.api.reg.sto.imp.sql.SqlRegistryStorage] (main)      To Version:    101
ERROR [io.qua.run.Application] (main) Failed to start application (with profile [prod]): java.lang.RuntimeException: Failed to start quarkus
        at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
        at io.quarkus.runtime.Application.start(Application.java:101)
        at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:111)
        at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
        at io.quarkus.runtime.Quarkus.run(Quarkus.java:44)
        at io.quarkus.runtime.Quarkus.run(Quarkus.java:124)
        at io.quarkus.runner.GeneratedMain.main(Unknown Source)
Caused by: io.apicurio.registry.storage.impl.sql.jdb.RuntimeSqlException: org.postgresql.util.PSQLException: ERROR: relation "outbox" already exists
        at io.apicurio.registry.storage.impl.sql.jdb.UpdateImpl.execute(UpdateImpl.java:28)
        at io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage.lambda$upgradeDatabaseRaw$3(AbstractSqlRegistryStorage.java:344)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage.upgradeDatabaseRaw(AbstractSqlRegistryStorage.java:337)
        at io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage.lambda$initialize$0(AbstractSqlRegistryStorage.java:234)
        at io.apicurio.registry.storage.impl.sql.AbstractHandleFactory.withHandle(AbstractHandleFactory.java:45)
        at io.apicurio.registry.storage.impl.sql.AbstractHandleFactory.withHandleNoException(AbstractHandleFactory.java:98)
        at io.apicurio.registry.storage.impl.sql.HandleFactoryProducer_ProducerMethod_produceHandleFactory_NU1A3ot6FkfWU5W0o_6TCeQ4KO4_ClientProxy.withHandleNoException(Unknown Source)
        at io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage.initialize(AbstractSqlRegistryStorage.java:223)
        at io.apicurio.registry.storage.impl.sql.SqlRegistryStorage_Subclass.initialize$$superforward(Unknown Source)
        at io.apicurio.registry.storage.impl.sql.SqlRegistryStorage_Subclass$$function$$110.apply(Unknown Source)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:73)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext$NextAroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:97)
        at io.apicurio.common.apps.logging.LoggingInterceptor.logMethodEntry(LoggingInterceptor.java:53)
        at io.apicurio.common.apps.logging.LoggingInterceptor_Bean.intercept(Unknown Source)
        at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:70)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext$NextAroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:97)
        at io.apicurio.registry.metrics.health.readiness.PersistenceTimeoutReadinessInterceptor.intercept(PersistenceTimeoutReadinessInterceptor.java:29)
        at io.apicurio.registry.metrics.health.readiness.PersistenceTimeoutReadinessInterceptor_Bean.intercept(Unknown Source)
        at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:70)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext$NextAroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:97)
        at io.apicurio.registry.metrics.health.liveness.PersistenceExceptionLivenessInterceptor.intercept(PersistenceExceptionLivenessInterceptor.java:25)
        at io.apicurio.registry.metrics.health.liveness.PersistenceExceptionLivenessInterceptor_Bean.intercept(Unknown Source)
        at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:70)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:62)
        at io.apicurio.registry.metrics.StorageMetricsInterceptor.intercept(StorageMetricsInterceptor.java:41)
        at io.apicurio.registry.metrics.StorageMetricsInterceptor_Bean.intercept(Unknown Source)
        at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
        at io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
        at io.apicurio.registry.storage.impl.sql.SqlRegistryStorage_Subclass.initialize(Unknown Source)
        at io.apicurio.registry.storage.impl.sql.SqlRegistryStorage.initialize(SqlRegistryStorage.java:35)
        at io.apicurio.registry.storage.impl.sql.SqlRegistryStorage_Subclass.initialize$$superforward(Unknown Source)
        at io.apicurio.registry.storage.impl.sql.SqlRegistryStorage_Subclass$$function$$109.apply(Unknown Source)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:73)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext$NextAroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:97)
        at io.apicurio.common.apps.logging.LoggingInterceptor.logMethodEntry(LoggingInterceptor.java:53)
        at io.apicurio.common.apps.logging.LoggingInterceptor_Bean.intercept(Unknown Source)
        at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:70)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext$NextAroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:97)
        at io.apicurio.registry.metrics.health.readiness.PersistenceTimeoutReadinessInterceptor.intercept(PersistenceTimeoutReadinessInterceptor.java:29)
        at io.apicurio.registry.metrics.health.readiness.PersistenceTimeoutReadinessInterceptor_Bean.intercept(Unknown Source)
        at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:70)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext$NextAroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:97)
        at io.apicurio.registry.metrics.health.liveness.PersistenceExceptionLivenessInterceptor.intercept(PersistenceExceptionLivenessInterceptor.java:25)
        at io.apicurio.registry.metrics.health.liveness.PersistenceExceptionLivenessInterceptor_Bean.intercept(Unknown Source)
        at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:70)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:62)
        at io.apicurio.registry.metrics.StorageMetricsInterceptor.intercept(StorageMetricsInterceptor.java:41)
        at io.apicurio.registry.metrics.StorageMetricsInterceptor_Bean.intercept(Unknown Source)
        at io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
        at io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
        at io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
        at io.apicurio.registry.storage.impl.sql.SqlRegistryStorage_Subclass.initialize(Unknown Source)
        at io.apicurio.registry.storage.impl.sql.SqlRegistryStorage_ClientProxy.initialize(Unknown Source)
        at io.apicurio.registry.storage.RegistryStorageProducer.raw(RegistryStorageProducer.java:94)
        at io.apicurio.registry.storage.RegistryStorageProducer.current(RegistryStorageProducer.java:51)
        at io.apicurio.registry.storage.RegistryStorageProducer_ProducerMethod_current_KUuvkPhY_4l3q6EULlF1GPXbKes_Bean.doCreate(Unknown Source)
        at io.apicurio.registry.storage.RegistryStorageProducer_ProducerMethod_current_KUuvkPhY_4l3q6EULlF1GPXbKes_Bean.create(Unknown Source)
        at io.apicurio.registry.storage.RegistryStorageProducer_ProducerMethod_current_KUuvkPhY_4l3q6EULlF1GPXbKes_Bean.create(Unknown Source)
        at io.quarkus.arc.impl.AbstractSharedContext.createInstanceHandle(AbstractSharedContext.java:119)
        at io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:38)
        at io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:35)
        at io.quarkus.arc.generator.Default_jakarta_enterprise_context_ApplicationScoped_ContextInstances.c120(Unknown Source)
        at io.quarkus.arc.generator.Default_jakarta_enterprise_context_ApplicationScoped_ContextInstances.computeIfAbsent(Unknown Source)
        at io.quarkus.arc.impl.AbstractSharedContext.get(AbstractSharedContext.java:35)
        at io.quarkus.arc.impl.ClientProxies.getApplicationScopedDelegate(ClientProxies.java:21)
        at io.apicurio.registry.storage.RegistryStorageProducer_ProducerMethod_current_KUuvkPhY_4l3q6EULlF1GPXbKes_ClientProxy.arc$delegate(Unknown Source)
                at io.apicurio.registry.storage.RegistryStorageProducer_ProducerMethod_current_KUuvkPhY_4l3q6EULlF1GPXbKes_ClientProxy.isReady(Unknown Source)
        at io.apicurio.common.apps.config.impl.DynamicConfigStartup.onStart(DynamicConfigStartup.java:46)
        at io.apicurio.common.apps.config.impl.DynamicConfigStartup_Bean.doCreate(Unknown Source)
        at io.apicurio.common.apps.config.impl.DynamicConfigStartup_Bean.create(Unknown Source)
        at io.apicurio.common.apps.config.impl.DynamicConfigStartup_Bean.create(Unknown Source)
        at io.quarkus.arc.impl.AbstractSharedContext.createInstanceHandle(AbstractSharedContext.java:119)
        at io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:38)
        at io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:35)
        at io.quarkus.arc.generator.Default_jakarta_enterprise_context_ApplicationScoped_ContextInstances.c55(Unknown Source)
        at io.quarkus.arc.generator.Default_jakarta_enterprise_context_ApplicationScoped_ContextInstances.computeIfAbsent(Unknown Source)
        at io.quarkus.arc.impl.AbstractSharedContext.get(AbstractSharedContext.java:35)
        at io.quarkus.arc.impl.ClientProxies.getApplicationScopedDelegate(ClientProxies.java:21)
        at io.apicurio.common.apps.config.impl.DynamicConfigStartup_ClientProxy.arc$delegate(Unknown Source)
        at io.apicurio.common.apps.config.impl.DynamicConfigStartup_ClientProxy.arc_contextualInstance(Unknown Source)
        at io.apicurio.common.apps.config.impl.DynamicConfigStartup_Observer_Synthetic_NzGHHybqmtc7Ct8_H1RkCbxYXQo.notify(Unknown Source)
        at io.quarkus.arc.impl.EventImpl$Notifier.notifyObservers(EventImpl.java:351)
        at io.quarkus.arc.impl.EventImpl$Notifier.notify(EventImpl.java:333)
        at io.quarkus.arc.impl.EventImpl.fire(EventImpl.java:80)
        at io.quarkus.arc.runtime.ArcRecorder.fireLifecycleEvent(ArcRecorder.java:155)
        at io.quarkus.arc.runtime.ArcRecorder.handleLifecycleEvents(ArcRecorder.java:106)
        at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy_0(Unknown Source)
        at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy(Unknown Source)
        ... 7 more
Caused by: org.postgresql.util.PSQLException: ERROR: relation "outbox" already exists
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2725)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2412)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:371)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:502)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:419)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:194)
        at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:155)
        at io.agroal.pool.wrapper.PreparedStatementWrapper.executeUpdate(PreparedStatementWrapper.java:90)
        at io.apicurio.registry.storage.impl.sql.jdb.UpdateImpl.execute(UpdateImpl.java:26)
        ... 102 more
apicurio-bot[bot] commented 2 days ago

Thank you for reporting an issue!

Pinging @jsenko to respond or triage.

wolfchimneyrock commented 2 days ago

additionally, SIGTERM is meant to be a "soft" shutdown - maybe if a database upgrade is in process and it receives SIGTERM, it will wait until the upgrade is complete? or maybe a different signal SIGUSR1 etc... could be used?

carlesarnal commented 1 day ago

We've had a bug in our upgrade scripts that is now fixed on main. We will do a 3.0.2 release today that fixes the issue.