apache / pulsar-client-python

Apache Pulsar Python client library
https://pulsar.apache.org/
Apache License 2.0
50 stars 42 forks source link

python client does not retry when get topic partition metadata to create producer fails #113

Open yebai1105 opened 1 year ago

yebai1105 commented 1 year ago

Describe the bug If you fail to get topic partition metadata through service_url, it will not retry and try to connect to other ip, and will directly exit abnormally after the connection fails pulsar-client=2.9.4

To Reproduce 1、Test Conditions 127.0.0.1:6650 The corresponding service is closed 127.0.0.2:6650 service is normal 2、Test code

import pulsar
client = pulsar.Client(
    authentication=pulsar.AuthenticationToken(
            "xxxxxxxxxxxx"),
    service_url="pulsar://127.0.0.1:6650,127.0.0.2:6650",
    #operation_timeout_seconds=120
)
producer = client.create_producer(
    topic='persistent://qlm-test/qlm-ns/python-test3',    
    send_timeout_millis=120000,
    block_if_queue_full=True,
    batching_enabled=True,
    batching_max_publish_delay_ms=10,
    batching_max_messages=100,
    batching_max_allowed_size_in_bytes=1024 * 1024,
    max_pending_messages=1000)

while(True):
    producer.send(('Hello-%d').encode('utf-8'))
producer.close()
client.close()

error log

2023-04-25 14:16:51.866 INFO  [139745917667072] ExecutorService:41 | Run io_service in a single thread
2023-04-25 14:16:51.866 INFO  [139746086545152] ClientConnection:189 | [<none> -> pulsar://127.0.0.1:6650,127.0.0.2:6650] Create ClientConnection, timeout=10000
2023-04-25 14:16:51.866 INFO  [139746086545152] ConnectionPool:96 | Created connection for pulsar://127.0.0.1:6650,127.0.0.2:6650
2023-04-25 14:16:51.867 WARN  [139745917667072] ClientConnection:436 | [<none> -> pulsar://127.0.0.1:6650,127.0.0.2:6650] Failed to establish connection: Connection refused
2023-04-25 14:16:51.867 INFO  [139745917667072] ClientConnection:1563 | [<none> -> pulsar://127.0.0.1:6650,127.0.0.2:6650] Connection closed
2023-04-25 14:16:51.867 ERROR [139745917667072] ClientImpl:190 | Error Checking/Getting Partition Metadata while creating producer on persistent://qlm-test/qlm-ns/python-test3 -- ConnectError
2023-04-25 14:16:51.867 INFO  [139745917667072] ClientConnection:263 | [<none> -> pulsar://127.0.0.1:6650,127.0.0.2:6650] Destroyed connection
Traceback (most recent call last):
  File "pulsar-test.py", line 8, in <module>
    producer = client.create_producer(
  File "/usr/local/python3/lib/python3.8/site-packages/pulsar/__init__.py", line 603, in create_producer
    p._producer = self._client.create_producer(topic, conf)
_pulsar.ConnectError: Pulsar error: ConnectError
2023-04-25 14:16:51.871 INFO  [139745917667072] ExecutorService:47 | Event loop of ExecutorService exits successfully

Expected behavior Within the timeout period, if the connection fails to be obtained, a retry is initiated Such as the processing method of java client:

2023-04-25 14:18:45.078[pulsar-external-listener-3-1] WARN  org.apache.pulsar.client.impl.PulsarClientImpl - [topic: persistent://qlm-test/qlm-ns/python-test3] Could not get connection while getPartitionedTopicMetadata -- Will try again in 6345 ms
2023-04-25 14:18:45.093[pulsar-client-io-1-1] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xedb4eab7, L:/10.13.209.102:58831 - R:/10.101.129.65:6650]] Connected to server
BewareMyPower commented 1 year ago

Could you try the latest Python client? The retry logic was introduced since 3.0.0.

yebai1105 commented 1 year ago

pulsar-client=3.1.0,I tried this version too, but there is also a problem

tisonkun commented 1 year ago

I tried this version too, but there is also a problem

Is the issue the same one? Or if you can provide logs on 3.1.0.

iamxinxin commented 10 months ago

I got same error on pulsar-client 3.1.0

code: `from pulsar import Client, AuthenticationToken,BatchingType client = Client( service_url='pulsar://node01.public.pulsar.test:6650', authentication=AuthenticationToken( "xxxxxxxxxxxxxxxxxxxxxxxxx")) producer = client.create_producer( 'my-topic', block_if_queue_full=True, batching_enabled=True, batching_max_publish_delay_ms=10, properties={ "producer-name": "test-producer-name", "producer-id": "test-producer-id" }, batching_type=BatchingType.KeyBased )

for i in range(10): producer.send(('Hello-%d' % i).encode('utf-8'))

client.close()`

log:

`Connected to pydev debugger (build 213.7172.26) 2023-12-21 11:29:56.580 INFO [140635486381888] ClientConnection:190 | [ -> pulsar://node01.public.pulsar.test:6650] Create ClientConnection, timeout=10000 2023-12-21 11:29:56.580 INFO [140635486381888] ConnectionPool:97 | Created connection for pulsar://node01.public.pulsar.test:6650 2023-12-21 11:29:56.635 INFO [140633529005824] ClientConnection:388 | [192.168.1.158:37378 -> 172.16.20.97:6650] Connected to broker 2023-12-21 11:29:56.688 INFO [140633529005824] ClientConnection:1600 | [192.168.1.158:37378 -> 172.16.20.97:6650] Connection closed with ConnectError 2023-12-21 11:29:56.688 ERROR [140633529005824] ClientImpl:183 | Error Checking/Getting Partition Metadata while creating producer on persistent://public/default/my-topic -- ConnectError 2023-12-21 11:29:56.688 INFO [140633529005824] ClientConnection:269 | [192.168.1.158:37378 -> 172.16.20.97:6650] Destroyed connection Traceback (most recent call last): File "/root/.pycharm_helpers/pydev/pydevd.py", line 1483, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/work/wangxx/projects/llm_qa/src/utils/pilsar_host.py", line 19, in producer = client.create_producer( File "/root/anaconda3/envs/MOSS/lib/python3.8/site-packages/pulsar/init.py", line 639, in create_producer p._producer = self._client.create_producer(topic, conf) _pulsar.ConnectError: Pulsar error: ConnectError

Process finished with exit code 1 `

hero6-coder commented 2 months ago

I have same issue with Java pulsar client & pulsar 3.2 when I enabled authentication: authenticationEnabled=true