department-of-veterans-affairs / abd-vro

To get Veterans benefits in minutes, VRO software uses health evidence data to help fast track disability claims.
Other
18 stars 6 forks source link

Error in container logs on starting postgres service #3057

Closed msnwatson closed 1 month ago

msnwatson commented 2 months ago

User Story

There is a bug which is causing errors to be emitted during startup of the Postgres service container. It seems to be related to the creation of the domain_cc schema.

This or something related is causing a failure of deployments of the db-init container in LHDI environments.

What is the consequence of this bug for VRO and partners?

Acceptance Criteria

  1. Postgres container starts up without any errors
  2. Container healthchecks CI step passes
  3. Deployments of db-init are able to complete successfully
nelsestu commented 2 months ago

The problem that is occurring and reoccurring in each environment, stems from how the flyway user was setup initially. Remember that the flyway user creates the vro database and everything that exists within it, so why can't we automate the fix? Because the flyway user is how we connect to the database in the first place. We could add a script that is just for reference, but we don't want to automate the execution of it because we can't connect as flyway user and then grant flyway user additional permissions. Instead we need to use the systemadmin user credentials to connect to the RDS instance in each environment and then run the following:

grant create on database vro to vro_prod_admin_user;

Seeing that the flyway user for prod contains the word prod, I imagine that each environment will have a similar user. The best way that I've found to identify which users map to which use cases is to look at the kubernetes secretes via Lens, and find the one named vro-db. This should exist in each environment. Then through lens you can expand the vro-db secret to find the individual values that are stored there. DB_FLYWAYUSER_NAME is gonna be the user name you want to use in place of vro_prod_admin_user. In the same secret, you'll find the DB_SUPERUSER_NAME, and DB_SUPERUSER_PASS values as well. Click the eye icon in lens to see the actual value. You'll login to the RDS instance using these SUPERUSER secrets, and you'll run the grant statement above replacing that flyway username for each environment.

Now that you know what to do and why, the final piece is to share how you can connect to the database. For this, we have the dev-tools image which was mostly created to solve this specific use case and is therefore intended for making ssh connections from our local environments. First, verify that a dev-tools image is available, and deploy one from the internal github actions if it is not. Once you have that shell session, you can use psql to connect to the database. psql to what? you might ask, and you'll see that i've included an example command below for reference, but the gist is, you need the hostname, the superuser name and the superuser password to connect via psql. You'll have the superuser name and password by following the previously outlined steps and looking at the vro-db secret. To get the hostname, you'll want to find a secret named rds-db. That will contain the driver's connection string format, but you can extract the hostname portion of that to use as the -h in the psql command:

psql -h abd-vro-prod-dbinstance.abc123.us-gov-west-1.rds.amazonaws.com -d vro -U vro_prod_super_user
chengjie8 commented 2 months ago

db-init migration issues in qa environment has been resolved and the job was executed successfully in qa

image.png