goldmansachs / obevo

Obevo is a database deployment tool that handles enterprise scale schemas and complexity
Apache License 2.0
236 stars 56 forks source link

Wrong table's schema introspecting query for Postgres DB #307

Open marciomed1 opened 1 year ago

marciomed1 commented 1 year ago

For bugs/errors, please provide the following if feasible:

With Postgres, seems there's a query trying to fetch tables and it's schemas with a wrong join, which returns far many results than needed causing deployments to last for more than hour, making it impractical to use. Also, this query is executed many times which seems not needed as there wasn't any table changes yet.
This wrong join may cause other issues.

The query we see while waiting the migration is:

SELECT
  current_database()::information_schema.sql_identifier AS TABLE_CATALOG,
  nc.nspname::information_schema.sql_identifier AS TABLE_SCHEMA,
  c.relname::information_schema.sql_identifier AS TABLE_NAME,
  c.*
FROM
  pg_catalog.pg_class c
  INNER JOIN pg_catalog.pg_namespace nc
    ON  c.relowner = nc.nspowner
WHERE
  c.relkind IN ('r', 'v')
;

The join is done by owner, so the query returns every schema of the same owner than the table one instead of the table's schema itself.

To fetch the table's schema, the join should be ON c.relnamespace = nc.oid.

Deploy any migration in a Postgres database and watch the executed queries.
Create a DB with a hundred of schemas and thousands of tables, and measure how long it takes to complete.

No exceptions, just a very long time waiting to deploy. The stage where Obevo spends most of the time is:

11:04:19.917 [main] DEBUG com.gs.obevo.dbmetadata.impl.DbMetadataManagerImpl - Starting query for DB metadata for artifactexecution/null//
11:04:33.071 [main] DEBUG com.gs.obevo.dbmetadata.impl.DbMetadataManagerImpl - Ending query for DB metadata for artifactexecution/null//
11:04:33.074 [main] DEBUG com.gs.obevo.dbmetadata.impl.DbMetadataManagerImpl - Starting query for DB metadata for artifactexecutionattr/null//
11:04:46.195 [main] DEBUG com.gs.obevo.dbmetadata.impl.DbMetadataManagerImpl - Ending query for DB metadata for artifactexecutionattr/null//
11:04:46.201 [main] DEBUG com.gs.obevo.dbmetadata.impl.DbMetadataManagerImpl - Starting query for DB metadata for artifactexecution/null//
11:05:00.014 [main] DEBUG com.gs.obevo.dbmetadata.impl.DbMetadataManagerImpl - Ending query for DB metadata for artifactexecution/null//
11:05:00.015 [main] DEBUG com.gs.obevo.db.impl.core.jdbc.JdbcHelper - Executing query on connection[2135889206: SELECT * FROM logging.artifactexecutionattr 
11:05:00.017 [main] DEBUG com.gs.obevo.db.impl.core.jdbc.JdbcHelper - Executing query on connection[2135889206: SELECT * FROM logging.artifactexecution WHERE dbschema = 'logging'
11:05:00.028 [main] DEBUG com.gs.obevo.db.impl.core.jdbc.JdbcHelper - Executing update on connection[1982703147: SET search_path TO logging with args: []

As you can see, it's expending around 15s on query for DB metadata. This seems to be done for every schema in the DB, regardless the ones specified in system-config.xml. Also, seems to me the query will always return the same results, as no migration has been executed yet.
15s per schema having a hundred of schemas is too much time. Most of this schemas are not managed by Obevo.

The latest released one.

marciomed1 commented 1 year ago

Seems the query comes from the SchemaCrawler library, not directly from Obevo.

sualeh commented 1 year ago

@marciomed1 - there is a fix for this issue in the SchemaCrawler 16.19.7.