Closed AnanaMJ closed 5 years ago
More details about the issue:
It looks like hibernate resolves the wrong schema when initially validating tables.
This happens when using both update
and validate
for spring.jpa.hibernate.ddl-auto
.
There are two databases which are supposed to have the same tables with the same schema.
schema_1
and schema_2
are both created in a MySQL instance, with no tables (hibernate is supposed to create the tables).
If hibernate runs the first time with spring.jpa.properties.hibernate.default_schema: schema_1
, then the tables are created there. Then, if hibernate runs the second time with spring.jpa.properties.hibernate.default_schema: schema_2
, hibernate validates schema_1
's tables, and tries to insert data into schema_2
's tables, which are not yet created, resulting in an error after all data has been replicated by Circus Train:
INFO org.hibernate.Version:54 - HHH000412: Hibernate Core {4.3.11.Final}
INFO org.hibernate.cfg.Environment:239 - HHH000206: hibernate.properties not found
INFO org.hibernate.cfg.Environment:346 - HHH000021: Bytecode provider name : javassist
INFO org.hibernate.annotations.common.Version:66 - HCANN000001: Hibernate Commons Annotations {4.0.5.Final}
INFO org.hibernate.dialect.Dialect:145 - HHH000400: Using dialect: org.hibernate.dialect.MySQL5Dialect
INFO org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory:47 - HHH000397: Using ASTQueryTranslatorFactory
INFO org.hibernate.tool.hbm2ddl.SchemaUpdate:207 - HHH000228: Running hbm2ddl schema update
INFO org.hibernate.tool.hbm2ddl.SchemaUpdate:218 - HHH000102: Fetching database metadata
INFO org.hibernate.tool.hbm2ddl.SchemaUpdate:230 - HHH000396: Updating schema
INFO org.hibernate.tool.hbm2ddl.TableMetadata:65 - HHH000261: Table found: custom_schema_1.audit_revision
INFO org.hibernate.tool.hbm2ddl.TableMetadata:66 - HHH000037: Columns: [id, timestamp]
INFO org.hibernate.tool.hbm2ddl.TableMetadata:68 - HHH000108: Foreign keys: []
INFO org.hibernate.tool.hbm2ddl.TableMetadata:69 - HHH000126: Indexes: [primary]
INFO org.hibernate.tool.hbm2ddl.TableMetadata:65 - HHH000261: Table found: custom_schema_1.legacy_replica_path
INFO org.hibernate.tool.hbm2ddl.TableMetadata:66 - HHH000037: Columns: [metastore_database_name, path, event_id, creation_timestamp, path_event_id, id, metastore_table_name]
INFO org.hibernate.tool.hbm2ddl.TableMetadata:68 - HHH000108: Foreign keys: []
INFO org.hibernate.tool.hbm2ddl.TableMetadata:69 - HHH000126: Indexes: [primary]
INFO org.hibernate.tool.hbm2ddl.SchemaUpdate:267 - HHH000232: Schema update complete
...
INFO com.hotels.bdp.circustrain.core.replica.HousekeepingCleanupLocationManager:61 - Scheduling old replica data for deletion for event ctp-20190503t100231.183z-cuppcmtd: s3://hcom-data-lab-bdp-applications/bdp_home/usa_names_no_filter_copy/ctp-20190503t095740.735z-qzbljkyl/ctp-20180907t104444.255z-uynosgiq/4b32923f-057e-4e3f-a789-bc607a3676f3
WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper:144 - SQL Error: 1146, SQLState: 42S02
ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper:146 - Table 'custom_schema_2.legacy_replica_path' doesn't exist
WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper:232 - SQL Warning Code: 1146, SQLState: 42S02
WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper:233 - Table 'custom_schema_2.legacy_replica_path' doesn't exist
ERROR com.hotels.bdp.circustrain.core.Locomotive:121 - Failed to replicate: shared-metastore:bdp.usa_names_no_filter to local-metastore:bdp_home.usa_names_no_filter_copy.
com.hotels.bdp.circustrain.api.CircusTrainException: Unable to replicate
However, if hibernate runs the first time with spring.jpa.properties.hibernate.default_schema: schema_2
(before the tables in schema_1
are created by hibernate), then the tables are created correctly and the job runs successfully.
As an easy workaround, the Circus Train job can be run with the following line added to the configuration:
spring.jpa.hibernate.ddl-auto: create
The first run creates the tables in schema_2
and the Circus Train job is then able to schedule paths to the housekeeping database for later use.
Then, the added configuration line should be removed and the other subsequent runs will use the correct database (schema_2
) to schedule paths for housekeeping, but still use the wrong one (schema_1
) to validate the schema.
The configuration like with create
should be removed because, if kept for subsequent runs, hibernate will drop the existing table and create a new one each time.
A fix for this is upgrading to Hibernate 5.1.0.Final
.
We had to remove the @Audit
annotations for legacy_replica_path
, because hibernate was trying to re-create foreign keys and failing.
One problem with the upgrade, however, is that when running with spring.jpa.properties.hibernate.hbm2ddl.auto: update
, even after tables are created, the following warnings come up (for each table that it validates):
INFO org.hibernate.Version:37 - HHH000412: Hibernate Core {5.1.0.Final}
INFO org.hibernate.cfg.Environment:213 - HHH000206: hibernate.properties not found
INFO org.hibernate.cfg.Environment:317 - HHH000021: Bytecode provider name : javassist
INFO org.hibernate.annotations.common.Version:66 - HCANN000001: Hibernate Commons Annotations {5.0.1.Final}
WARN org.hibernate.orm.deprecation:786 - HHH90000006: Attempted to specify unsupported NamingStrategy via setting [hibernate.ejb.naming_strategy]; NamingStrategy has been removed in favor of the split ImplicitNamingStrategy and PhysicalNamingStrategy; use [hibernate.implicit_naming_strategy] or [hibernate.physical_naming_strategy], respectively, instead.
INFO com.zaxxer.hikari.HikariDataSource:95 - HikariPool-1 - Starting...
INFO com.zaxxer.hikari.HikariDataSource:107 - HikariPool-1 - Start completed.
INFO org.hibernate.dialect.Dialect:156 - HHH000400: Using dialect: org.hibernate.dialect.MySQL5Dialect
INFO org.hibernate.envers.boot.internal.EnversServiceImpl:91 - Envers integration enabled? : true
INFO org.hibernate.tool.schema.extract.internal.InformationExtractorJdbcDatabaseMetaDataImpl:365 - HHH000262: Table not found: hibernate_sequence
DEBUG org.hibernate.SQL:92 - create table hibernate_sequence (next_val bigint)
WARN org.hibernate.tool.schema.internal.ExceptionHandlerLoggedImpl:27 - GenerationTarget encountered exception accepting command : Unable to execute command [create table hibernate_sequence (next_val bigint)]
org.hibernate.tool.schema.spi.CommandAcceptanceException: Unable to execute command [create table hibernate_sequence (next_val bigint)]
at org.hibernate.tool.schema.internal.exec.GenerationTargetToDatabase.accept(GenerationTargetToDatabase.java:63)
at org.hibernate.tool.schema.internal.SchemaMigratorImpl.applySqlString(SchemaMigratorImpl.java:567)
at org.hibernate.tool.schema.internal.SchemaMigratorImpl.applySqlStrings(SchemaMigratorImpl.java:551)
at org.hibernate.tool.schema.internal.SchemaMigratorImpl.createTable(SchemaMigratorImpl.java:339)
at org.hibernate.tool.schema.internal.SchemaMigratorImpl.performMigration(SchemaMigratorImpl.java:257)
at org.hibernate.tool.schema.internal.SchemaMigratorImpl.doMigration(SchemaMigratorImpl.java:137)
at org.hibernate.tool.schema.internal.SchemaMigratorImpl.doMigration(SchemaMigratorImpl.java:110)
at org.hibernate.tool.schema.spi.SchemaManagementToolCoordinator.performDatabaseAction(SchemaManagementToolCoordinator.java:176)
at org.hibernate.tool.schema.spi.SchemaManagementToolCoordinator.process(SchemaManagementToolCoordinator.java:64)
at org.hibernate.internal.SessionFactoryImpl.<init>(SessionFactoryImpl.java:458)
at org.hibernate.boot.internal.SessionFactoryBuilderImpl.build(SessionFactoryBuilderImpl.java:465)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl.build(EntityManagerFactoryBuilderImpl.java:881)
at org.springframework.orm.jpa.vendor.SpringHibernateJpaPersistenceProvider.createContainerEntityManagerFactory(SpringHibernateJpaPersistenceProvider.java:60)
at org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean.createNativeEntityManagerFactory(LocalContainerEntityManagerFactoryBean.java:340)
at org.springframework.orm.jpa.AbstractEntityManagerFactoryBean.afterPropertiesSet(AbstractEntityManagerFactoryBean.java:319)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1641)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1578)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:545)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1054)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:829)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:538)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:760)
at org.springframework.boot.SpringApplication.createAndRefreshContext(SpringApplication.java:360)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:306)
at com.hotels.bdp.circustrain.CircusTrain.main(CircusTrain.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.sql.SQLSyntaxErrorException: Table 'hibernate_sequence' already exists
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:782)
at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:666)
at com.zaxxer.hikari.pool.ProxyStatement.execute(ProxyStatement.java:98)
at com.zaxxer.hikari.pool.HikariProxyStatement.execute(HikariProxyStatement.java)
at org.hibernate.tool.schema.internal.exec.GenerationTargetToDatabase.accept(GenerationTargetToDatabase.java:51)
... 35 more
DEBUG org.hibernate.SQL:92 - insert into hibernate_sequence values ( 1 )
INFO org.hibernate.tool.schema.extract.internal.InformationExtractorJdbcDatabaseMetaDataImpl:365 - HHH000262: Table not found: hibernate_sequence
These are just warnings, so Circus Train runs successfully (and it doesn't delete data in the housekeeping tables in between runs), but the warnings are not valid. The tables do exist.
It also tries to drop and add a constraint, and I'm not sure what's happening with that:
INFO org.hibernate.tool.schema.extract.internal.InformationExtractorJdbcDatabaseMetaDataImpl:365 - HHH000262: Table not found: legacy_replica_path
DEBUG org.hibernate.SQL:92 - alter table legacy_replica_path drop constraint UKtlauv1txmi2vux3fqp1y1ueav
DEBUG org.hibernate.SQL:92 - alter table legacy_replica_path add constraint UKtlauv1txmi2vux3fqp1y1ueav unique (path, creation_timestamp)
INFO org.hibernate.tool.schema.extract.internal.InformationExtractorJdbcDatabaseMetaDataImpl:365 - HHH000262: Table not found: legacy_replica_path
After further investigations into the AWS documentation, we discovered that a MariaDB driver should be used with MySQL instance on EMR or Amazon RDS for MySQL.
Trying different MySQL driver versions didn't work, and it looks like MariaDB driver (1.8.0 - for Java 7) solves our issues:
with a MariaDB driver, when running the released version on Circus Train (14.0.1) this issue disappears:
If hibernate runs the first time with spring.jpa.properties.hibernate.default_schema: schema_1, then the tables are created there. Then, if hibernate runs the second time with spring.jpa.properties.hibernate.default_schema: schema_2, hibernate validates schema_1's tables, and tries to insert data into schema_2's tables, which are not yet created, resulting in an error after all data has been replicated by Circus Train
with a MariaDB driver and Circus Train with hibernate 5, no incorrect warnings about missing tables come up. It looks like the tables are correctly identified from the schema.
The changes were as following:
mariadb-java-client-1.8.0.jar
to the Circus Train classpathhousekeeping:
data-source:
driver-class-name: com.mysql.cj.jdbc.Driver
to
housekeeping:
data-source:
driver-class-name: org.mariadb.jdbc.Driver
The MariaDB driver is also what AWS recommends for Aurora(MySQL). See https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Connecting.html
After looking into the hive-site.xml
file on an EMR, we also noticed that the driver used is MariaDB:
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.mariadb.jdbc.Driver</value>
</property>
which is used for
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hostname:3306/hive?createDatabaseIfNotExist=true</value>
</property>
The jar it currently uses seems to be mariadb-java-client-2.2.1.jar
, which is fairly recent.
Steps for reproducing:
schema_1
(by configuring housekeeping to haveschema-name: schema_1
schema-name: schema_2
Expected outcome: In the second step, Circus Train should use
schema_2
to schedule paths for housekeeping.Current outcome: In the second step, Circus Train uses
schema_1
to find tablesand then uses
schema_2
to schedule paths for deletion: