apache / dolphinscheduler-sdk-python

Apache DolphinScheduler Python API, aka PyDolphinscheduler.
https://dolphinscheduler.apache.org/python/main
Apache License 2.0
50 stars 18 forks source link

pydolphinscheduler is having problems using the official documentation code #98

Closed Treasure-u closed 11 months ago

Treasure-u commented 1 year ago

The version of my pydolphinscheduler is 4.0.3, and dolphinscheduler is 3.1.5. , my OS is ubuntu 22.10,Other than that,I use hdfs for the resource center. When I run code in the official documentation

image

something wrong happened.I was able to upload main.py and dependence.py without a problem, but my workflow didn't work. When I look at the log it says the host instance does not exist.

截屏2023-08-01 09 48 39 image image
zhongjiajie commented 1 year ago

Hi @Treasure-u , do you mean both resource files and workflow created success, but when you run the workflow and go to see the workflow/task instance log is show's host is empty?

If in this situation, could you

  1. check your task definition and see whether resource in your task looking good?
  2. check your master and worker and see whether they exists and working fun?
  3. check your master or worker log to see whether have detail error log?
zhongjiajie commented 1 year ago

duplicate #101

baratamavinash225 commented 12 months ago

Infact I have been trying this example and faced the same error.

  1. check your task definition and see whether resource in your task looking good? - Resources are not added into the Task Definition.
image
  1. check your master and worker and see whether they exists and working fun? - Yes both are working fine
  2. check your master or worker log to see whether have detail error log? - error from the master log [ERROR] 2023-09-22 18:14:52.460 +0000 TaskLogLogger-class org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[128] - [WorkflowInstance-109][TaskInstance-148] - Task use-resource is submitted to priority queue error java.lang.NullPointerException: null at org.apache.dolphinscheduler.service.process.ProcessServiceImpl.queryTenantCodeByResName(ProcessServiceImpl.java:2151) at org.apache.dolphinscheduler.service.process.ProcessServiceImpl$$FastClassBySpringCGLIB$$9d3e18f9.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708) at org.apache.dolphinscheduler.service.process.ProcessServiceImpl$$EnhancerBySpringCGLIB$$f5037904.queryTenantCodeByResName(<generated>) at org.apache.dolphinscheduler.server.master.runner.task.BaseTaskProcessor.lambda$getResourceFullNames$4(BaseTaskProcessor.java:629) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at org.apache.dolphinscheduler.server.master.runner.task.BaseTaskProcessor.getResourceFullNames(BaseTaskProcessor.java:628) at org.apache.dolphinscheduler.server.master.runner.task.BaseTaskProcessor.getTaskExecutionContext(BaseTaskProcessor.java:316) at org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor.dispatchTask(CommonTaskProcessor.java:116) at org.apache.dolphinscheduler.server.master.runner.task.BaseTaskProcessor.dispatch(BaseTaskProcessor.java:241) at org.apache.dolphinscheduler.server.master.runner.task.BaseTaskProcessor.action(BaseTaskProcessor.java:212) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable.submitTaskExec(WorkflowExecuteRunnable.java:990) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable.submitStandByTask(WorkflowExecuteRunnable.java:1845) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable.submitPostNode(WorkflowExecuteRunnable.java:1367) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable.call(WorkflowExecuteRunnable.java:703) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1771) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:830)
Andy-xu-007 commented 11 months ago

YES, I have the same problem while using the pydolphinscheduler CLI to create workflow with YAML file, pydolphinscheduler's version im using is 4.0.3, and the version of dolphinscheduler is 3.1.7, a part of whole code as following: `

Auth token is default token, highly recommend add a token in production, especially you deploy in public network. /home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/java_gateway.py:324: UserWarning: Using unmatched version of pydolphinscheduler (version 4.0.3) and Java gateway (version 3.1.7) may cause errors. We strongly recommend you to find the matched version (check: https://pypi.org/project/apache-dolphinscheduler) gateway = GatewayEntryPoint() Traceback (most recent call last): File "/home/hadoop/.local/bin/pydolphinscheduler", line 8, in <module> sys.exit(cli()) File "/home/hadoop/.local/lib/python3.8/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/home/hadoop/.local/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/hadoop/.local/lib/python3.8/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/hadoop/.local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/hadoop/.local/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/cli/commands.py", line 106, in yaml create_workflow(yaml_file) File "/home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/core/yaml_workflow.py", line 494, in create_workflow YamlWorkflow.parse(yaml_file) File "/home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/core/yaml_workflow.py", line 235, in parse workflow_name = cls(yaml_file).create_workflow() File "/home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/core/yaml_workflow.py", line 179, in create_workflow task = self.parse_task(task_data, name2task) File "/home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/core/yaml_workflow.py", line 290, in parse_task task = task_cls(**task_params) File "/home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/tasks/shell.py", line 58, in __init__ super().__init__(name, TaskType.SHELL, *args, **kwargs) File "/home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/core/task.py", line 222, in __init__ self.get_content() File "/home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/core/task.py", line 332, in get_content res = self.get_plugin() File "/home/hadoop/.local/lib/python3.8/site-packages/pydolphinscheduler/core/task.py", line 319, in get_plugin raise PyResPluginException( pydolphinscheduler.exceptions.PyResPluginException: The execution command of this task is a file, but the resource plugin is empty the error remaind me to add resource plugin in YMAL file, resource center build on HDFS, after checking the source code and official doc, pydolphinscheduler support Local, GitHub, GitLab, OSS, S3, but not contain HDFS, so i should create a new resource plugin with HDFS? or do you have some other way can help me to create a workflow which can using resource center file with YAML file

Looking forward to your reply

Treasure-u commented 11 months ago

maybe you can try my fixed dolphinscheduler-sdk https://github.com/Treasure-u/dolphinscheduler-sdk-python

zhongjiajie commented 11 months ago

hi @Treasure-u @baratamavinash225 @Andy-xu-007 this issue already fix in https://github.com/apache/dolphinscheduler-sdk-python/pull/116 and we will release in version 4.0.4 within 3 days, thanks for your bug reports