apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.78k stars 4.59k forks source link

[Bug] [DataQuality] The data quality task is abnormal #11134

Closed ulnit closed 11 months ago

ulnit commented 2 years ago

Search before asking

What happened

The error message is: 【Caused by: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO t_ds_dq_execute_result ("rule_type","rule_name","process_definition_id","process_instance_id","task_instance_id","statistics_value","comparison_value","comparison_type","check_type","threshold","operator","failure_strategy","error_output_path","create_time","update_time") VALUES (0,'(null_check)',0,796,859,0,0,2,0,0,3,0,'hdfs://mycluster:8020/user/dolphinscheduler/data_quality_error_data/0_796_dq002','2022-07-21 05:20:53','2022-07-21 05:20:53') was aborted: ERROR: column "create_time" is of type timestamp without time zone but expression is of type character varying 建议:You will need to rewrite or cast the expression. 位置:337 Call getNextException to see other errors in the batch. at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:165) at org.postgresql.core.ResultHandlerDelegate.handleError(ResultHandlerDelegate.java:52) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:559) at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:887) at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:910) at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1649) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:713) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:868) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:867) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1011) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1011) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2268) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.postgresql.util.PSQLException: ERROR: column "create_time" is of type timestamp without time zone but expression is of type character varying 建议:You will need to rewrite or cast the expression. 位置:337 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365) ... 18 more】

What you expected to happen

Nodes with data quality are running properly and data can be inserted into the database properly.

How to reproduce

Normal deployment and use, you can reproduce.

Anything else

No response

Version

3.0.0-beta-2

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 2 years ago

Search before asking

What happened

The error message is: 【Caused by: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO t_ds_dq_execute_result ("rule_type","rule_name","process_definition_id","process_instance_id","task_instance_id","statistics_value","comparison_value","comparison_type","check_type","threshold","operator","failure_strategy","error_output_path","create_time","update_time") VALUES (0,'(null_check)',0,796,859,0,0,2,0,0,3,0,'hdfs://mycluster:8020/user/dolphinscheduler/data_quality_error_data/0_796_dq002','2022-07-21 05:20:53','2022-07-21 05:20:53') was aborted: ERROR: column "create_time" is of type timestamp without time zone but expression is of type character varying 建议:You will need to rewrite or cast the expression. 位置:337 Call getNextException to see other errors in the batch. at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:165) at org.postgresql.core.ResultHandlerDelegate.handleError(ResultHandlerDelegate.java:52) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:559) at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:887) at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:910) at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1649) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:713) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:868) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:867) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1011) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1011) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2268) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.postgresql.util.PSQLException: ERROR: column "create_time" is of type timestamp without time zone but expression is of type character varying 建议:You will need to rewrite or cast the expression. 位置:337 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365) ... 18 more】

What you expected to happen

Nodes with data quality are running properly and data can be inserted into the database properly.

How to reproduce

Normal deployment and use, you can reproduce.

Anything else

No response

Version

3.0.0-beta-2

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 2 years ago

Thank you for your feedback, we have received your issue, Please wait patiently for a reply.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 2 years ago

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

aiwhj commented 1 year ago

3.1.4 The problem still exists

himper commented 1 year ago

3.1.7 The problem still exists

23/08/03 14:17:12 INFO Executor: Running task 0.0 in stage 2.0 (TID 2) [INFO] 2023-08-03 14:17:14.660 +0800 - -> 23/08/03 14:17:14 INFO JDBCRDD: closed connection 23/08/03 14:17:14 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 1669 bytes result sent to driver 23/08/03 14:17:14 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 1604 ms on localhost (executor driver) (1/1) 23/08/03 14:17:14 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 23/08/03 14:17:14 INFO DAGScheduler: ShuffleMapStage 2 (save at JdbcWriter.java:86) finished in 1.690 s 23/08/03 14:17:14 INFO DAGScheduler: looking for newly runnable stages 23/08/03 14:17:14 INFO DAGScheduler: running: Set() 23/08/03 14:17:14 INFO DAGScheduler: waiting: Set(ResultStage 3) 23/08/03 14:17:14 INFO DAGScheduler: failed: Set() 23/08/03 14:17:14 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[16] at save at JdbcWriter.java:86), which has no missing parents 23/08/03 14:17:14 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 36.6 KB, free 116.9 MB) 23/08/03 14:17:14 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 15.8 KB, free 116.9 MB) 23/08/03 14:17:14 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on dolphinscheduler-worker-1.dolphinscheduler-worker-headless.dolphinscheduler.svc.cluster.local:36171 (size: 15.8 KB, free: 116.9 MB) 23/08/03 14:17:14 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1184 23/08/03 14:17:14 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[16] at save at JdbcWriter.java:86) (first 15 tasks are for partitions Vector(0)) 23/08/03 14:17:14 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks 23/08/03 14:17:14 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, localhost, executor driver, partition 0, ANY, 7767 bytes) 23/08/03 14:17:14 INFO Executor: Running task 0.0 in stage 3.0 (TID 3) [INFO] 2023-08-03 14:17:15.737 +0800 - -> 23/08/03 14:17:14 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks 23/08/03 14:17:14 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/08/03 14:17:14 INFO CodeGenerator: Code generated in 98.387902 ms 23/08/03 14:17:15 INFO CodeGenerator: Code generated in 201.736892 ms 23/08/03 14:17:15 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.sql.BatchUpdateException: Batch entry 0 INSERT INTO t_ds_dq_execute_result ("rule_type","rule_name","process_definition_id","process_instance_id","task_instance_id","statistics_value","comparison_value","comparison_type","check_type","threshold","operator","failure_strategy","error_output_path","create_time","update_time") VALUES (3,'(multi_table_value_comparison)',0,22,24,1691033463,1691033476,0,0,0,0,0,'s3a://dolphinscheduler/user/root/data_quality_error_data/0_22_tag_quality','2023-08-03 14:16:03','2023-08-03 14:16:03') was aborted: ERROR: column "create_time" is of type timestamp without time zone but expression is of type character varying Hint: You will need to rewrite or cast the expression. Position: 337 Call getNextException to see other errors in the batch. at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:165) at org.postgresql.core.ResultHandlerDelegate.handleError(ResultHandlerDelegate.java:52) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2367) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:560) at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:887) at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:910) at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1663) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:676) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:838) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:838) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2107) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2107) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.postgresql.util.PSQLException: ERROR: column "create_time" is of type timestamp without time zone but expression is of type character varying Hint: You will need to rewrite or cast the expression. Position: 337 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366) ... 19 more 23/08/03 14:17:15 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3, localhost, executor driver): java.sql.BatchUpdateException: Batch entry 0 INSERT INTO t_ds_dq_execute_result ("rule_type","rule_name","process_definition_id","process_instance_id","task_instance_id","statistics_value","comparison_value","comparison_type","check_type","threshold","operator","failure_strategy","error_output_path","create_time","update_time") VALUES (3,'(multi_table_value_comparison)',0,22,24,1691033463,1691033476,0,0,0,0,0,'s3a://dolphinscheduler/user/root/data_quality_error_data/0_22_tag_quality','2023-08-03 14:16:03','2023-08-03 14:16:03') was aborted: ERROR: column "create_time" is of type timestamp without time zone but expression is of type character varying

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 11 months ago

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

wangbowen1024 commented 9 months ago

you can try jdbc:postgresql://localhost:5432/databaseName?stringtype=unspecified