Open wolfchimneyrock opened 2 days ago
Thank you for reporting an issue!
Pinging @jsenko to respond or triage.
additionally, SIGTERM is meant to be a "soft" shutdown - maybe if a database upgrade is in process and it receives SIGTERM, it will wait until the upgrade is complete? or maybe a different signal SIGUSR1 etc... could be used?
We've had a bug in our upgrade scripts that is now fixed on main. We will do a 3.0.2 release today that fixes the issue.
Description
Registry Version: 3.0.0.M5 -> 3.0.1 upgrade Persistence type: postgresql
We perform database schema upgrades with a special 'bootstrap' instance of apicurio which has admin privileges to our postgres database (and can thus create tables, etc) whereas the client serving apicurio instances run with either read-only or read-write (but not db/table admin) access.
typically the bootstrap process starts an admin instance of apicurio, and then waits until /health/ready reports that the storage layer is ready, then stops the admin apicurio (assuming the database was successfully upgraded)
this worked for all of the 2.x upgrades we handled.
with 3.0.x upgrade from db version 100 to 101, it looks like the /health/ready endpoint reports the storage is available before the database schema change is finished - resulting in further failures to start apicurio 3.0.1. rolling back to 3.0.0M5 works fine, since the new tables are ignored.
Steps to Reproduce
start apicurio 3.0.1 with a db version 100 poll /health/ready until response code 200 returned terminate apicurio using SIGTERM
Expected vs Actual Behaviour
expected: database is fully upgraded before /health/ready reports 200 code. actual: database upgrade is still in progress when /health/ready reports 200.
Alternatively, the database upgrade could be idempotent and could resume a partial interrupted application.
Logs