NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
28 stars 13 forks source link

EPIC: Reconfiguring Metacat on Startup #1638

Open taojing2002 opened 1 year ago

taojing2002 commented 1 year ago

THIS IS AN EPIC Related Issues:

================================================

Before we introduced the metacat-site.properties, we used those properties in the metacat.properties file to force operators to go through admin pages:

configutil.propertiesConfigured=false
configutil.authConfigured=false
configutil.skinsConfigured=true
configutil.databaseConfigured=false
configutil.solrserverConfigured=false
configutil.dataoneConfigured=false
configutil.ezidConfigured=bypassed
configutil.quotaConfigured=bypassed

If values of any above of those properties are not bypassed or true, Metacat can't be functional. So operators have to go through the admin pages. Sometimes this approach is overkilling when we do NOT need to change anything during a re-deployment.

Now, those properties are stored in the metacat-site.properties file. Once they were set the values true or bypassed in the metacat-site.properties file, Metacat can function without going through the admin page after a new deployment. That may lead Metacat skip a critical update process.

We need a way to signal/force operators to run the admin pages when it is necessary.

artntek commented 1 year ago

I've given this some thought, and considered the idea of adding a flag to force specific config updates, but the question remains as to when we would remove those flags? So, the only viable solution I can come up with is to have the properties code always read from and (importantly) write to metacat.properties (instead of site props) when reading/writing the configutil.* properties. (We already do this for the application.sitePropertiesDir property, for obvious reasons)

That way, the old functionality is retained (i.e. every time metacat is redeployed, those settings get deleted and the admin has to re-configure).

mbjones commented 1 year ago

Let's discuss this further. I think a diagram of the startup sequence would be helpful in the two main deployment scenarios.

artntek commented 1 year ago

We discussed in the backend developers' meeting on 6/13/23.

In summary, @artntek will:

1. As a stopgap ONLY, change the code to write the above properties back to metacat.properties instead of metacat-site.properties, in addition to these used by ESS-DIVE:

   configutil.upgrade.status=
   configutil.upgrade.database.status=
   configutil.upgrade.java.status=
   configutil.upgrade.solr.status=

2. Summarize our discussion below, and identify some concrete future actions, so we have a strategy for resolving this longer-term, without needing to write to metacat.properties (which is what we've been trying to avoid)

artntek commented 1 year ago

Summary of 6/13/23 Backend Dev Meeting Discussion

CONTEXT

A. In legacy deployments, there have been 3 main scenarios requiring admin intervention. These are the scenarios that have required the use of the properties listed above:

  1. Changes to the Metacat version (jar file version) requiring a different data structure (schema changes)

    • It's important for people to have backups before doing upgrades. Having them click “upgrade” on admin page helps remind them of this - we won’t be able to do that in k8s
  2. Changes to properties (requiring providing a value for a new property)

    • How do we ensure these are set by k8s operator?
      • Suggestion: add sensible defaults to properties, but configure so feature is disabled by default. If admin wants to enable it, they then need to validate the default properties or provide new ones.
  3. Changes that require java code to run - eg changing solr schema, or migrating data from local hard drive into hashstore etc.

    • These are traditionally done by built-in Java code in legacy deployments, but could be in a sidecar process in k8s.
    • Could we package the java and use the same shared code for both? eg have a jar file dependency for legacy upgrades, and run that jar in a container for sidecar k8s upgrades, for example?

B. The following properties are used primarily by ESS-DIVE for their Helm deployment.

They are read by script directly from the metacat.properties text file in order to determine whether upgrades have been carried out successfully, before starting the app:

    configutil.upgrade.status=
    configutil.upgrade.database.status=
    configutil.upgrade.java.status=
    configutil.upgrade.solr.status=
artntek commented 1 year ago

INCREMENTAL GOALS

  1. Determine which of the properties listed above can be removed completely, along with the accompanying code (eg can we remove skins from metacat, now we have metacatui?

    • Note ESS-DIVE still relies on skins in some way - needs further investigation
  2. determine how we can either eliminate the need for each of the remaining properties, or eliminate the need for them to be written to metacat.properties instead of metacat-site.properties

    • see suggestions under "CONTEXT, Section A", above
    • also see SPECIFIC VERSIONING EXAMPLE below.
  3. Remove the need for the ESS-DIVE-related properties (listed in "CONTEXT, Section B", above) by adding liveness & readiness probes to check when upgrades are finished.

artntek commented 1 year ago

SPECIFIC VERSIONING EXAMPLE

Scenario:

Current solution:

NOTE that in legacy installations, Metacat can be started without a database, and then the database can be configured in the admin pages in order to complete the setup. This is different from k8s deployments, which expect the database to be present at first startup

  1. All the configutil.* properties default to false for each deployment (because metacat.properties is always overwritten).

  2. This forces the operator to visit the admin page, where they first set up the database connection details, thus enabling the next step...

  3. They click "Configure Database", which causes the code to check the app version (from metacat.properties) against the database version (from the db_version table - now that the app has access)

  4. The app then sees the need for DB upgrade, and presents the operator with an "upgrade" button.

  5. Manual intervention (i.e. clicking the button) runs the upgrade

New solution would instead:

  1. Check the app/jar version (2.19) against the DB version (2.18) upon startup

    • Causality issue: how would we make this check in legacy deployments, where the DB connection details have not yet been set??
  2. If an upgrade is needed, set configutil.databaseConfigured=false in metacat-site.properties. (Note this is different from before, because the value is written in response to the code actually having identified a need for upgrade.)

  3. The resulting action is:

    • legacy deployments: configutil.databaseConfigured=false forces the operator to visit the database configuration admin page and click the "upgrade" button to run the upgrade, as before.
    • k8s deployments: configutil.databaseConfigured=false simply causes the upgrade to run automatically.
artntek commented 1 year ago

@mbjones and @taojing2002 - I've summarized above what I remembered from our discussion (from here onwards). I'd love to hear your feedback and corrections, and I'd particularly appreciate @mbjones editing the SPECIFIC VERSIONING EXAMPLE section, since I'm not sure I've done it justice. Thanks!

artntek commented 11 months ago

this is now resolved by a workaround (see #1664), but needs a longer-term solution that cleans up these properties and obviates the need for writing to metacat.properties instead of metacat-site.properties. Repeating this section from above, for clarity:

STILL TO DO