datavane / tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
https://tis.pub
Apache License 2.0
1.04k stars 221 forks source link

版本4.0.0,MSSql-->doris,多个任务在同一时间的定时同步任务均会失败。 #394

Open zzxVv opened 3 days ago

zzxVv commented 3 days ago

多个任务,例如在11:30分同时执行,同步均会失败。powerjob和tis均在同一个k8s集群中。powerjob具体报错如下: com.qlangtech.tis.lang.TisException: maxRetry:1,url:http://tis-console-cluster-svc.default:8080/tjs/config/config.ajax?action=fullbuild_workflow_action&event_submit_do_initialize_trigger_task=true

baisui1981 commented 3 days ago

可否提供详细的异常栈?

zzxVv commented 3 days ago

这是powerjob-worker详细异常日志:

2024-11-19 15:00:13.737  INFO 8 --- [b-worker-core-0] t.p.w.background.WorkerHealthReporter    : [WorkerHealthReporter] report health status,appId:1,appName:yygs,isOverload:false,maxLightweightTaskNum:50,currentLightweightTaskNum:1,maxHeavyweightTaskNum:12,currentHeavyweightTaskNum:0
2024-11-19 15:00:15.027  WARN 8 --- [task-execute-33] t.p.w.c.t.task.light.LightTaskTracker    : [TaskTracker-739135350395371776] process failed !

com.qlangtech.tis.lang.TisException: maxRetry:1,url:http://tis-console-cluster-svc.default:8080/tjs/config/config.ajax?action=fullbuild_workflow_action&event_submit_do_initialize_trigger_task=true
        at com.qlangtech.tis.lang.TisException.create(TisException.java:165) ~[tis-manage-pojo-4.0.0.jar!/:na]
        at com.qlangtech.tis.lang.TisException.create(TisException.java:154) ~[tis-manage-pojo-4.0.0.jar!/:na]
        at com.qlangtech.tis.manage.common.ConfigFileContext.processContent(ConfigFileContext.java:133) ~[tis-manage-pojo-4.0.0.jar!/:na]
        at com.qlangtech.tis.manage.common.HttpUtils.process(HttpUtils.java:133) ~[tis-manage-pojo-4.0.0.jar!/:na]
        at com.qlangtech.tis.manage.common.HttpUtils.post(HttpUtils.java:129) ~[tis-manage-pojo-4.0.0.jar!/:na]
        at com.qlangtech.tis.manage.common.HttpUtils.post(HttpUtils.java:139) ~[tis-manage-pojo-4.0.0.jar!/:na]
        at com.qlangtech.tis.manage.common.HttpUtils.soapRemote(HttpUtils.java:179) ~[tis-manage-pojo-4.0.0.jar!/:na]
        at com.qlangtech.tis.manage.common.HttpUtils.soapRemote(HttpUtils.java:166) ~[tis-manage-pojo-4.0.0.jar!/:na]
        at com.qlangtech.tis.exec.IExecChainContext.triggerNewTask(IExecChainContext.java:131) ~[tis-plugin-4.0.0.jar!/:na]
        at com.qlangtech.tis.datax.powerjob.TISInitializeProcessor.process(TISInitializeProcessor.java:64) ~[classes!/:na]
        at tech.powerjob.worker.core.tracker.task.light.LightTaskTracker.processTask(LightTaskTracker.java:211) ~[powerjob-worker-4.3.6.jar!/:4.3.6]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_292]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_292]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_292]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.8.0_292]
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[na:1.8.0_292]
        at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[na:1.8.0_292]
        at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[na:1.8.0_292]
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[na:1.8.0_292]
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[na:1.8.0_292]
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_292]
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) ~[na:1.8.0_292]
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) ~[na:1.8.0_292]
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593) ~[na:1.8.0_292]
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) ~[na:1.8.0_292]
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) ~[na:1.8.0_292]
        at com.qlangtech.tis.manage.common.ConfigFileContext.getNetInputStream(ConfigFileContext.java:179) ~[tis-manage-pojo-4.0.0.jar!/:na]
        at com.qlangtech.tis.manage.common.ConfigFileContext.processContent(ConfigFileContext.java:117) ~[tis-manage-pojo-4.0.0.jar!/:na]
        ... 12 common frames omitted

2024-11-19 15:00:15.027  INFO 8 --- [task-execute-33] t.p.w.c.t.task.light.LightTaskTracker    : [TaskTracker-739135350395371776] task complete ! create time:1731999600022,queue time:1,use time:15004,result:ProcessResult(success=false, msg=com.qlangtech.tis.lang.TisException: maxRetry:1,url:http://tis-console-cluster-svc.default:8080/tjs/config/config.ajax?action=fullbuild_workflow_action&event_submit_do_initialize_trigger_task=true)
2024-11-19 15:00:15.046  INFO 8 --- [task-execute-33] t.p.w.core.tracker.task.TaskTracker      : [TaskTracker-739135350395371776] report finished status(detail=TaskTrackerReportInstanceStatusReq(appId=1, jobId=305, instanceId=739135350395371776, wfInstanceId=739135251741147392, appendedWfContext={}, instanceStatus=4, result=com.qlangtech.tis.lang.TisException: maxRetry:1,url:http://tis-console-cluster-svc.default:8080/tjs/config/config.ajax?action=fullbuild_workflow_action&event_submit_do_initialize_trigger_task=true, totalTaskNum=1, succeedTaskNum=0, failedTaskNum=1, startTime=1731999600022, endTime=1731999615027, reportTime=1731999615028, sourceAddress=10.244.1.9:27777, needAlert=false, alertContent=null)) success
2024-11-19 15:00:15.047  WARN 8 --- [task-execute-33] t.p.w.c.t.task.light.LightTaskTracker    : [TaskTracker-739135350395371776] remove TaskTracker,task status WORKER_PROCESS_FAILED,start time:1731999600023,end time:1731999615027,real cost:15004,total time:15025

————————————————————————————————————————————

这是server的详细异常日志:

2024-11-19 15:00:15.030  INFO 7 --- [akka.w-r-c-d-24] t.p.s.c.i.InstanceManager                : [InstanceManager-739135350395371776] instance execute failed and have no chance to retry.
2024-11-19 15:00:15.034  INFO 7 --- [akka.w-r-c-d-24] t.p.s.c.i.InstanceManager                : [Instance-739135350395371776] process finished, final status is FAILED.
2024-11-19 15:00:15.036  INFO 7 --- [akka.w-r-c-d-24] t.p.s.c.w.WorkflowInstanceManager        : [Workflow-10|739135251741147392] node(nodeId=409,jobId=305,instanceId=739135350395371776) finished in workflowInstance, status=FAILED,result=com.qlangtech.tis.lang.TisException: maxRetry:1,url:http://tis-console-cluster-svc.default:8080/tjs/config/config.ajax?action=fullbuild_workflow_action&event_submit_do_initialize_trigger_task=true
2024-11-19 15:00:15.036  WARN 7 --- [akka.w-r-c-d-24] t.p.s.c.w.WorkflowInstanceManager        : [Workflow-10|739135251741147392] workflow instance process failed because middle task(instanceId=739135350395371776) failed
2024-11-19 15:00:15.046  INFO 7 --- [akka.w-r-c-d-24] MONITOR_LOGGER_TT_REPORT_STATUS          : 1|305|739135350395371776|739135251741147392|FAILED|0|SUCCESS|18
2024-11-19 15:00:15.046  INFO 7 --- [akka.w-r-c-d-30] t.p.s.c.i.InstanceManager                : [InstanceManager-739135350395371777] instance execute failed and have no chance to retry.
2024-11-19 15:00:15.051  INFO 7 --- [akka.w-r-c-d-30] t.p.s.c.i.InstanceManager                : [Instance-739135350395371777] process finished, final status is FAILED.
2024-11-19 15:00:15.052  INFO 7 --- [akka.w-r-c-d-30] t.p.s.c.w.WorkflowInstanceManager        : [Workflow-3|739135251401408768] node(nodeId=399,jobId=97,instanceId=739135350395371777) finished in workflowInstance, status=FAILED,result=com.qlangtech.tis.lang.TisException: maxRetry:1,url:http://tis-console-cluster-svc.default:8080/tjs/config/config.ajax?action=fullbuild_workflow_action&event_submit_do_initialize_trigger_task=true
2024-11-19 15:00:15.053  WARN 7 --- [akka.w-r-c-d-30] t.p.s.c.w.WorkflowInstanceManager        : [Workflow-3|739135251401408768] workflow instance process failed because middle task(instanceId=739135350395371777) failed
baisui1981 commented 2 days ago

ok, 收到,谢谢

zzxVv commented 3 hours ago

ok, 收到,谢谢

请问看出什么问题了吗?