I discovered an issue with changing the default Housekeeping schema name from circus_train to anything else.
Test command:
/opt/circus-train/bin/circus-train.sh --config=/mnt/circus-train/jobs/test.yml --modules=replication --housekeeping.data-source.username=$HK_USER --housekeeping.data-source.password=$HK_PASSWORD
Test YAML file:
instance:
name: jetstream-ct
home: /mnt/circus-train
logging:
config: file:${instance.home}/conf/log4j.xml
housekeeping:
schema-name: housekeeping
data-source:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://host:3306/housekeeping
copier-options:
tmp-dir: "hdfs:///tmp/dsp-lab-jetstream-ct/"
file-attribute: "replication, blocksize, user, group, permission, checksumtype"
canned-acl: "bucket-owner-full-control"
region: "us-west-2"
max-maps: 50
s3-server-side-encryption: true
copier-factory-class: "com.hotels.bdp.circustrain.s3mapreducecpcopier.S3MapReduceCpCopierFactory"
source-catalog:
name: "mauihdp"
hive-metastore-uris: "thrift://host:9083"
replica-catalog:
name: "apiary-lab-hms-us-west-2"
hive-metastore-uris: "thrift://host.lcl:9083"
table-replications:
-
source-table:
database-name: "dm"
table-name: "ar_typ_dim"
generate-partition-filter: true
replica-table:
database-name: "dm"
table-name: "ar_typ_dim"
table-location: "s3://bucket/ar_typ_dim"
sns-event-listener:
region: "us-west-2"
topic: "arn:aws:sns:us-west-2:<topic<"
subject: "CircusTrainStatus"
headers:
requestId: "api13cda12a-18f8-11e9-9be1-abf8ece1bc04"
databaseTableName: "dm.ar_typ_dim"
route: "route"```
As shown above, housekeeping has been reconfigured to use a schema named `housekeeping` in a MySQL server.
When this schema does not exist, Circus Train fails with the following error:
```19/01/15 11:16:13 ERROR boot.SpringApplication: Application startup failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private javax.sql.DataSource org.springframework.boot.autoconfigure.orm.jpa.JpaBaseConfiguration.dataSource; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'housekeepingDataSource' defined in class path resource [com/hotels/housekeeping/HousekeepingConfiguration.class]: Initialization of bean failed; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'dataSourceInitializer': Invocation of init method failed; nested exception is org.springframework.jdbc.datasource.init.ScriptStatementFailedException: Failed to execute SQL script statement #1 of class path resource [schema.sql]: CREATE SCHEMA IF NOT EXISTS circus_train; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Access denied for user 'jetstream_rw'@'%' to database 'circus_train'```
[ct-failed.txt](https://github.com/HotelsDotCom/circus-train/files/2761166/ct-failed.txt)
After creating `circus_train` schema, the above Circus Train command starts to work. It also uses the configured `housekeeping` schema instead of the new `circus_train` schema:
```mysql> use circus_train;
Database changed
mysql> show tables;
Empty set (0.00 sec)
mysql> use housekeeping;
Database changed
mysql> show tables;
+------------------------+
| Tables_in_housekeeping |
+------------------------+
| audit_revision |
| legacy_replica_path |
+------------------------+
2 rows in set (0.01 sec)```
It appears that the initial schema check ignores the configured `housekeeping.schema-name` value, and instead always checks for the `circus_train` schema. But if that schema does exist, Circus Train skips the `dataSourceInitializer` and the rest of the housekeeping code uses the correct schema as configured.
I discovered an issue with changing the default Housekeeping schema name from
circus_train
to anything else.Test command:
/opt/circus-train/bin/circus-train.sh --config=/mnt/circus-train/jobs/test.yml --modules=replication --housekeeping.data-source.username=$HK_USER --housekeeping.data-source.password=$HK_PASSWORD
Test YAML file: