apache / dolphinscheduler-sdk-python

Apache DolphinScheduler Python API, aka PyDolphinscheduler.
https://dolphinscheduler.apache.org/python/main
Apache License 2.0
50 stars 18 forks source link

Which version of Python should be used with pydolphinscheduler #152

Open LiSirPython opened 2 weeks ago

LiSirPython commented 2 weeks ago

C:\Users\33124>pip list Package Version


aliyun-python-sdk-core 2.15.2 aliyun-python-sdk-kms 2.16.4 apache-dolphinscheduler 4.0.4 boto3 1.35.7 botocore 1.35.7 certifi 2024.7.4 cffi 1.17.0 charset-normalizer 3.3.2 click 8.1.7 colorama 0.4.6 crcmod 1.7 cryptography 43.0.0 idna 3.8 jmespath 0.10.0 libcst 1.4.0 oss2 2.18.6 packaging 24.1 pip 24.2 py4j 0.10.9.7 pycparser 2.22 pycryptodome 3.20.0 python-dateutil 2.9.0.post0 python-gitlab 4.10.0 PyYAML 6.0.2 requests 2.32.3 requests-toolbelt 1.0.0 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 s3transfer 0.10.2 setuptools 58.1.0 six 1.16.0 stmdency 0.0.5 urllib3 2.2.2

C:\Users\33124>pydolphinscheduler config --init Auth token is default token, highly recommend add a token in production, especially you deploy in public network. Traceback (most recent call last): File "C:\Python310\lib\site-packages\py4j\java_gateway.py", line 982, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Python310\Scripts\pydolphinscheduler.exe__main.py", line 4, in File "C:\Python310\lib\site-packages\pydolphinscheduler\cli\commands.py", line 29, in from pydolphinscheduler.core.yaml_workflow import create_workflow File "C:\Python310\lib\site-packages\pydolphinscheduler\core__init__.py", line 20, in from pydolphinscheduler.core.engine import Engine File "C:\Python310\lib\site-packages\pydolphinscheduler\core\engine.py", line 24, in from pydolphinscheduler.core.task import Task File "C:\Python310\lib\site-packages\pydolphinscheduler\core\task.py", line 36, in from pydolphinscheduler.core.resource import Resource File "C:\Python310\lib\site-packages\pydolphinscheduler\core\resource.py", line 23, in from pydolphinscheduler.java_gateway import gateway File "C:\Python310\lib\site-packages\pydolphinscheduler\java_gateway.py", line 324, in gateway = GatewayEntryPoint() File "C:\Python310\lib\site-packages\pydolphinscheduler\java_gateway.py", line 74, in init gateway_version = self.get_gateway_version() File "C:\Python310\lib\site-packages\pydolphinscheduler\java_gateway.py", line 111, in get_gateway_version return self.gateway.entry_point.getGatewayVersion() File "C:\Python310\lib\site-packages\py4j\java_gateway.py", line 1321, in call__ answer = self.gateway_client.send_command(command) File "C:\Python310\lib\site-packages\py4j\java_gateway.py", line 1036, in send_command connection = self._get_connection() File "C:\Python310\lib\site-packages\py4j\java_gateway.py", line 984, in _get_connection connection = self._create_connection() File "C:\Python310\lib\site-packages\py4j\java_gateway.py", line 988, in _create_connection connection = GatewayConnection( File "C:\Python310\lib\site-packages\py4j\java_gateway.py", line 1112, in init af_type = socket.getaddrinfo(self.address, self.port)[0][0] File "C:\Python310\lib\socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 11001] getaddrinfo failed

LiSirPython commented 2 weeks ago

I tried Python 3.8, Python 3.12, and Python 3.10, and all of them reported this error.

zhongjiajie commented 2 weeks ago

Could you make sure your python gateway for dolphinscheduler is started? @LiSirPython

LiSirPython commented 2 weeks ago

My Dolphinscheduler cluster is deployed in a Kubernetes environment, and I have confirmed that the dolphinscheduler-api service is already running. The application.yaml is also set to true. According to the documentation, when I execute the pydolphinscheduler config --init command, it should not involve the remote dolphinscheduler-api service’s Python gateway, right? This command is supposed to create a new YAML file in ~/pydolphinscheduler/config.yaml, isn’t it?@zhongjiajie

zhongjiajie commented 2 weeks ago

when I execute the pydolphinscheduler config --init command, it should not involve the remote dolphinscheduler-api service’s Python gateway, right?

It will currently, is will check whether the remote Python gateway version is match to installed pydolphinscheduler version

This command is supposed to create a new YAML file in ~/pydolphinscheduler/config.yaml, isn’t it? Yes

zhongjiajie commented 2 weeks ago

which version of dolphinscheduler you use?

LiSirPython commented 2 weeks ago

The issue is that after installing, the first step I performed was executing the ‘pydolphinscheduler config --init’ command. I haven’t set the remote API server address, so how does it check for compatibility? My Dolphinscheduler version is 3.2.2.

LiSirPython commented 2 weeks ago

我使用pip install安装完以后,没有设置远端api-server地址,它检测不了兼容性吧?我看文档中描述这条命令的作用只是在默认路径下生成一个yaml配置文件,并没有其他操作。

zhongjiajie commented 2 weeks ago

I think the reason is version mismatch, please try installing the latest unreleased version of pydolphinscheduler. https://github.com/apache/dolphinscheduler-sdk-python/issues/119#issuecomment-1825194665

you can see version match at https://dolphinscheduler.apache.org/python/main/#version

LiSirPython commented 2 weeks ago

我用Linux虚拟机成功一次,我明天再重试一下,感谢大佬回复

zhongjiajie commented 2 weeks ago

I successfully exexute in my local laptop with the command, maybe you should try later

pip install -U pip
pip install git+https://github.com/apache/dolphinscheduler-sdk-python.git#egg=apache-dolphinscheduler
pydolphinscheduler config --init
LiSirPython commented 2 weeks ago

大佬,你的笔记本是Windows还是MacBook?我一开始的反馈是win11

zhongjiajie commented 2 weeks ago

I test both macos and linux, and I do not have windows environment

zhongjiajie commented 2 weeks ago

but we have windows in our CI, and it should work fine https://github.com/apache/dolphinscheduler-sdk-python/blob/5e319989babe4b458f6758b2ab15475d78359003/.github/workflows/ci.yaml#L81

LiSirPython commented 2 weeks ago

Good morning, I just tried again on another CentOS 7 machine with Python 3.12 version. I was able to install pydolphinscheduler and successfully execute the pydolphinscheduler config --init command. It seems that installing Python or using Python in WSL on a Windows environment cannot execute the pydolphinscheduler config --init command.

LiSirPython commented 2 weeks ago

I used the Python exe installer on Win11, but it still reports an error after installation. However, it works fine in Windows WSL. In WSL, I can use pipx to install pydolphinscheduler. After installation, I run the pipx ensurepath command to add the environment variable, and then it works properly.

zhongjiajie commented 2 weeks ago

So the command pydolphinscheduler config --init works in Windows WSL, but not for CMD terminal?

LiSirPython commented 2 weeks ago

Yes, it can run in WSL, but it fails when executed in Windows CMD.Do you need me to close this issue?

LiSirPython commented 2 weeks ago

大佬,你们现在支持创建seatunnel了吗?

zhongjiajie commented 2 weeks ago

Yes, it can run in WSL, but it fails when executed in Windows CMD.Do you need me to close this issue?

It seem like the environment issue, can you make sure you python env work well?

zhongjiajie commented 2 weeks ago

seatunnel not supported yet, and we have exists issue in https://github.com/apache/dolphinscheduler-sdk-python/issues/96