Failed to lookup topic (request timeout) after pulsar broker restart

ernado commented 4 years ago

Expected behavior

Consumers/Producers are created normally after broker restart.

Actual behavior

level=info msg="Connecting to broker" remote_addr="pulsar://10.0.0.4:6650"
level=info msg="TCP connection established" local_addr="10.0.0.2:55310" remote_addr="pulsar://10.0.0.4:6650"
level=info msg="Connection is ready" local_addr="10.0.0.2:55310" remote_addr="pulsar://10.0.0.4:6650"
level=warning msg="Failed to lookup topic" error="request timed out" name=zwvsg subscription=my-sub topic="persistent://public/default/telegram"
level=error msg="Failed to create consumer" error="request timed out"

On broker side:

INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /10.0.0.2:55354
INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /10.0.0.2:55354
INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /10.0.0.2:55358
INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /10.0.0.2:55358
INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /10.0.0.2:55362

Steps to reproduce

Start pulsar (3 node deployment)
Connect to first node with producer/consumer
Restart first node
Observe request timed out errors

System configuration

Pulsar version: 2.6.0

Hi, while testing 3 node deployment durability I've encountered weird behaviour: go client was able to reconnect, but any request was timing out. This state is quite flaky and I was able to reproduce it once in standalone deployment.

One or multiple broker restarts fixes this problem, but it looks scary.

Can it be pulsar-client-go specific or I should create issue in apache/pulsar? I'm not familiar with java, so can't provide any useful debug info now, sorry.

merlimat commented 4 years ago

It looks like an issue on broker lookups in 2.6 release

ernado commented 4 years ago

Should I try 2.6.1 or 2.5?

dalianzhu commented 4 years ago

I may have the same issue，I am not sure if the server has restarted, but my program is stuck. There are no errors.

The last few logs are：

[WARNING] Failed to lookup topic map[error:request timed out name:syfjp subscription:nmpc_subscribe_v2_snmpPulsar_system topic:snmp_oceanus_filled_pulsar-partition-3]
[INFO] Reconnecting to broker in 3.2s map[name:syfjp subscription:nmpc_subscribe_v2_snmpPulsar_system topic:snmp_oceanus_filled_pulsar-partition-3]
[WARNING] Failed to lookup topic map[error:request timed out name:syfjp subscription:nmpc_subscribe_v2_snmpPulsar_system topic:snmp_oceanus_filled_pulsar-partition-5]
[WARNING] Detected stale connection to broker map[local_addr:172.19.2.126:34620 remote_addr:pulsar://100.66.21.146:6650]

The program is stuck for several hours, the functions Receive and Ack are stuck，Return to normal after my program restart.

devinbost commented 4 years ago

We're seeing this issue on 2.6.1

devinbost commented 3 years ago

We just ran into this issue with a Java client. Broker restart (where the topic was living) and app restart did not seem to resolve the issue.

devinbost commented 3 years ago

Failed to instantiate [org.apache.pulsar.client.api.Consumer]: Factory method 'consumer' threw exception; nested exception is org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 2 lookup request timedout after ms 30000 at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:656) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE] at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:636) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE] . . .

kenbaev commented 3 years ago

Hi! is there any workaround for this bug?

apache / pulsar-client-go