apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.69k stars 4.58k forks source link

[Bug] [db init] Got error Caused by: java.sql.SQLSyntaxErrorException: Duplicate column name 'operator' when upgrade to 3.2.2 from 3.2.1 on EKS #16366

Open ahululu opened 1 month ago

ahululu commented 1 month ago

Search before asking

What happened

ENV: EKS1.29

when upgrade to 3.2.2 from 3.2.1 (replace helm image), get error from pod dolphinscheduler-db-init-job-xxxx

xxx  INFO 8 --- [           main] o.a.d.common.sql.SqlScriptRunner         : Execute sql: DROP TABLE IF EXISTS `t_ds_relation_project_worker_group`; success
xxx  INFO 8 --- [           main] o.a.d.common.sql.SqlScriptRunner         : Execute sql: CREATE TABLE `t_ds_relation_project_worker_group` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key',
  `project_code` bigint(20) NOT NULL COMMENT 'project code',
  `worker_group` varchar(255) DEFAULT NULL COMMENT 'worker group',
  `create_time` datetime DEFAULT NULL COMMENT 'create time',
  `update_time` datetime DEFAULT NULL COMMENT 'update time',
  PRIMARY KEY (`id`),
  UNIQUE KEY unique_project_worker_group(project_code,worker_group)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE = utf8_bin; success
xxxx ERROR 8 --- [           main] o.a.d.t.datasource.upgrader.UpgradeDao   : Execute ddl file failed, meet an unknown exception, schemaDir:  3.2.2_schema, ddlScript: dolphinscheduler_ddl.sql

java.sql.SQLSyntaxErrorException: Duplicate column name 'operator'
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120) ~[mysql-connector-j-8.0.32.jar:8.0.32]
    at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) ~[mysql-connector-j-8.0.32.jar:8.0.32]
    at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:763) ~[mysql-connector-j-8.0.32.jar:8.0.32]
    at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:648) ~[mysql-connector-j-8.0.32.jar:8.0.32]
    at com.zaxxer.hikari.pool.ProxyStatement.execute(ProxyStatement.java:94) ~[HikariCP-4.0.3.jar:na]
    at com.zaxxer.hikari.pool.HikariProxyStatement.execute(HikariProxyStatement.java) ~[HikariCP-4.0.3.jar:na]
    at org.apache.dolphinscheduler.common.sql.SqlScriptRunner.execute(SqlScriptRunner.java:58) ~[dolphinscheduler-common-3.2.2.jar:3.2.2]
    at org.apache.dolphinscheduler.tools.datasource.upgrader.UpgradeDao.upgradeDolphinSchedulerDDL(UpgradeDao.java:154) [dolphinscheduler-tools-3.2.2.jar:3.2.2]
    at org.apache.dolphinscheduler.tools.datasource.upgrader.UpgradeDao.upgradeDolphinScheduler(UpgradeDao.java:89) [dolphinscheduler-tools-3.2.2.jar:3.2.2]
    at org.apache.dolphinscheduler.tools.datasource.DolphinSchedulerManager.upgradeDolphinScheduler(DolphinSchedulerManager.java:111) [dolphinscheduler-tools-3.2.2.jar:3.2.2]
    at org.apache.dolphinscheduler.tools.datasource.UpgradeDolphinScheduler$UpgradeRunner.run(UpgradeDolphinScheduler.java:53) [dolphinscheduler-tools-3.2.2.jar:3.2.2]
    at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:771) [spring-boot-2.7.3.jar:2.7.3]
    at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:755) [spring-boot-2.7.3.jar:2.7.3]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:315) [spring-boot-2.7.3.jar:2.7.3]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1306) [spring-boot-2.7.3.jar:2.7.3]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1295) [spring-boot-2.7.3.jar:2.7.3]
    at org.apache.dolphinscheduler.tools.datasource.UpgradeDolphinScheduler.main(UpgradeDolphinScheduler.java:36) [dolphinscheduler-tools-3.2.2.jar:3.2.2]

xxxxx  INFO 8 --- [           main] ConditionEvaluationReportLoggingListener :

Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
xxxx ERROR 8 --- [           main] o.s.boot.SpringApplication               : Application run failed

java.lang.IllegalStateException: Failed to execute CommandLineRunner
    at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:774) [spring-boot-2.7.3.jar:2.7.3]
    at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:755) [spring-boot-2.7.3.jar:2.7.3]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:315) [spring-boot-2.7.3.jar:2.7.3]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1306) [spring-boot-2.7.3.jar:2.7.3]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1295) [spring-boot-2.7.3.jar:2.7.3]
    at org.apache.dolphinscheduler.tools.datasource.UpgradeDolphinScheduler.main(UpgradeDolphinScheduler.java:36) [dolphinscheduler-tools-3.2.2.jar:3.2.2]
Caused by: java.lang.RuntimeException: Execute ddl file failed, meet an unknown exception
    at org.apache.dolphinscheduler.tools.datasource.upgrader.UpgradeDao.upgradeDolphinSchedulerDDL(UpgradeDao.java:162) ~[dolphinscheduler-tools-3.2.2.jar:3.2.2]
    at org.apache.dolphinscheduler.tools.datasource.upgrader.UpgradeDao.upgradeDolphinScheduler(UpgradeDao.java:89) ~[dolphinscheduler-tools-3.2.2.jar:3.2.2]
    at org.apache.dolphinscheduler.tools.datasource.DolphinSchedulerManager.upgradeDolphinScheduler(DolphinSchedulerManager.java:111) ~[dolphinscheduler-tools-3.2.2.jar:3.2.2]
    at org.apache.dolphinscheduler.tools.datasource.UpgradeDolphinScheduler$UpgradeRunner.run(UpgradeDolphinScheduler.java:53) ~[dolphinscheduler-tools-3.2.2.jar:3.2.2]
    at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:771) [spring-boot-2.7.3.jar:2.7.3]
    ... 5 common frames omitted
Caused by: java.sql.SQLSyntaxErrorException: Duplicate column name 'operator'
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120) ~[mysql-connector-j-8.0.32.jar:8.0.32]
    at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) ~[mysql-connector-j-8.0.32.jar:8.0.32]
    at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:763) ~[mysql-connector-j-8.0.32.jar:8.0.32]
    at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:648) ~[mysql-connector-j-8.0.32.jar:8.0.32]
    at com.zaxxer.hikari.pool.ProxyStatement.execute(ProxyStatement.java:94) ~[HikariCP-4.0.3.jar:na]
    at com.zaxxer.hikari.pool.HikariProxyStatement.execute(HikariProxyStatement.java) ~[HikariCP-4.0.3.jar:na]
    at org.apache.dolphinscheduler.common.sql.SqlScriptRunner.execute(SqlScriptRunner.java:58) ~[dolphinscheduler-common-3.2.2.jar:3.2.2]
    at org.apache.dolphinscheduler.tools.datasource.upgrader.UpgradeDao.upgradeDolphinSchedulerDDL(UpgradeDao.java:154) ~[dolphinscheduler-tools-3.2.2.jar:3.2.2]
    ... 9 common frames omitted

xxxx  INFO 8 --- [           main] com.zaxxer.hikari.HikariDataSource       : DolphinScheduler - Shutdown initiated...
xxx  INFO 8 --- [           main] com.zaxxer.hikari.HikariDataSource       : DolphinScheduler - Shutdown completed.

What you expected to happen

I don't think db init scripts are well compatible with this kind of image upgrade

How to reproduce

just deploy 3.2.1 by helm chart, then upgrade to 3.2.2

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

Code of Conduct

ahululu commented 1 month ago

Maybe the helm cluster 3.2.1 was upgraded to 3.2.2 once before, but there is a problem, So, I use helm rollback , but the database will not roll. Now my workaround is to manually delete the operator column and then re-execute the file manually:: https://github.com/apache/dolphinscheduler/blob/a5061eb3518fd9ea5db4a85892da61781baac04c/dolphinscheduler-dao/src/main/resources/sql/upgrade/3.2.2_schema/mysql/dolphinscheduler_ddl.sql

ahululu commented 1 month ago

Maybe the helm cluster 3.2.1 was upgraded to 3.2.2 once before, but there is a problem, So, I use helm rollback , but the database will not roll. Now my workaround is to manually delete the operator column and then re-execute the file manually:: https://github.com/apache/dolphinscheduler/blob/a5061eb3518fd9ea5db4a85892da61781baac04c/dolphinscheduler-dao/src/main/resources/sql/upgrade/3.2.2_schema/mysql/dolphinscheduler_ddl.sql

But I'm not sure that's the right thing to do

SbloodyS commented 1 month ago

Maybe the helm cluster 3.2.1 was upgraded to 3.2.2 once before, but there is a problem, So, I use helm rollback , but the database will not roll. Now my workaround is to manually delete the operator column and then re-execute the file manually:: https://github.com/apache/dolphinscheduler/blob/a5061eb3518fd9ea5db4a85892da61781baac04c/dolphinscheduler-dao/src/main/resources/sql/upgrade/3.2.2_schema/mysql/dolphinscheduler_ddl.sql

Yes. You can try this manully.

ahululu commented 1 month ago

After a few days of use after the upgrade, a new issue was discovered. api pod does not recognize alert pod with the following error message: ` Caused by: java.net.UnknownHostException: dolphinscheduler-alert-xxxxxx-xxx ` The temporary workaround for now is to add alert pod parsing to /etc/hosts . However, this operation is problematic, and if api pod restarts, the alert function will not be available again. Does anyone know how to fix it?

SbloodyS commented 1 month ago

cc @Gallardot

Gallardot commented 1 month ago

After a few days of use after the upgrade, a new issue was discovered. api pod does not recognize alert pod with the following error message:

`

Caused by: java.net.UnknownHostException: dolphinscheduler-alert-xxxxxx-xxx

`

The temporary workaround for now is to add alert pod parsing to /etc/hosts .

However, this operation is problematic, and if api pod restarts, the alert function will not be available again. Does anyone know how to fix it?

I need more error logs

ahululu commented 1 month ago

After a few days of use after the upgrade, a new issue was discovered. api pod does not recognize alert pod with the following error message: ` Caused by: java.net.UnknownHostException: dolphinscheduler-alert-xxxxxx-xxx ` The temporary workaround for now is to add alert pod parsing to /etc/hosts . However, this operation is problematic, and if api pod restarts, the alert function will not be available again. Does anyone know how to fix it?

I need more error logs

It's the same problem as this issue:

16405