apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.13k stars 3.57k forks source link

[Start Standalone Error] In the case of wifi connection, /bin/pulsar standalone is not working properly #4593

Closed wolfstudy closed 4 years ago

wolfstudy commented 5 years ago

Describe the bug In the case of a WI-FI connection, when I use ./bin/pulsar standalone to start pulsar, I get the following error:

10:57:29.260 [DLM-/stream/storage-OrderedScheduler-4-0] INFO  org.apache.bookkeeper.stream.storage.impl.store.MVCCStoreFactoryImpl - Successfully initialize stream(1)/range(0) at storage container (1)
10:57:29.260 [DLM-/stream/storage-OrderedScheduler-4-0] INFO  org.apache.bookkeeper.stream.storage.impl.store.MVCCStoreFactoryImpl - Add store (scId = 1, streamId = 1, rangeId = 0) at storage container (1)
10:57:29.260 [DLM-/stream/storage-OrderedScheduler-4-0] INFO  org.apache.bookkeeper.stream.storage.impl.sc.StorageContainerImpl - Successfully started storage container (1).
10:57:29.260 [DLM-/stream/storage-OrderedScheduler-4-0] INFO  org.apache.bookkeeper.stream.storage.impl.sc.StorageContainerRegistryImpl - Successfully started registered StorageContainer ('1').
10:57:29.260 [DLM-/stream/storage-OrderedScheduler-4-0] INFO  org.apache.bookkeeper.stream.storage.impl.sc.ZkStorageContainerManager - Successfully started storage container (1)
10:57:29.260 [DLM-/stream/storage-OrderedScheduler-4-0] INFO  org.apache.bookkeeper.stream.storage.impl.sc.ZkStorageContainerManager - Storage container (org.apache.bookkeeper.stream.storage.impl.sc.StorageContainerImpl@4f52c8b) is added to live set.
^C10:57:51.115 [Curator-LeaderSelector-0] WARN  org.apache.bookkeeper.stream.storage.impl.cluster.ClusterControllerLeaderImpl - Controller leader is interrupted, giving up leadership
10:57:51.118 [ProcessThread(sid:0 cport:2181):] INFO  org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x1001157fce30004 type:delete cxid:0xd zxid:0x2dc txntype:-1 reqpath:n/a Error Path:/stream/controller/_c_a0c46f44-3df2-4456-a1cf-2925c284121c-lock-0000000006 Error:KeeperErrorCode = NoNode for /stream/controller/_c_a0c46f44-3df2-4456-a1cf-2925c284121c-lock-0000000006
10:57:51.120 [Thread-1] INFO  org.apache.bookkeeper.stream.storage.impl.sc.ZkStorageContainerManager - Stopping storage container (0)
10:57:51.120 [Thread-1] INFO  org.apache.bookkeeper.stream.storage.impl.sc.StorageContainerRegistryImpl - Unregistered StorageContainer ('0').
10:57:51.120 [Thread-1] INFO  org.apache.bookkeeper.stream.storage.impl.sc.StorageContainerImpl - Stopping storage container (0) ...
10:57:51.120 [Thread-1] INFO  org.apache.bookkeeper.stream.storage.impl.store.MVCCStoreFactoryImpl - Closing 000000000000000000/000000000000000001/000000000000000000 of sc 0
10:57:51.121 [Thread-1] INFO  org.apache.bookkeeper.stream.storage.impl.store.MVCCStoreFactoryImpl - Closing 000000000000000000/000000000000000000/000000000000000000 of sc 0
10:57:51.121 [io-write-scheduler-OrderedScheduler-1-0] INFO  org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal - closing async state store 000000000000000000/000000000000000001/000000000000000000
10:57:51.121 [io-write-scheduler-OrderedScheduler-0-0] INFO  org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal - closing async state store 000000000000000000/000000000000000000/000000000000000000
10:57:51.121 [io-read-scheduler-OrderedScheduler-1-0] INFO  org.apache.distributedlog.BKLogSegmentWriter - Flushing before closing log segment streams_000000000000000000_000000000000000001_000000000000000000:<default>:inprogress_000000000000000007
10:57:51.121 [io-read-scheduler-OrderedScheduler-0-0] INFO  org.apache.distributedlog.BKLogSegmentWriter - Flushing before closing log segment streams_000000000000000000_000000000000000000_000000000000000000:<default>:inprogress_000000000000000007
10:57:51.122 [Thread-1] INFO  org.apache.bookkeeper.stream.storage.impl.sc.ZkStorageContainerManager - Stopping storage container (1)
10:57:51.122 [Thread-1] INFO  org.apache.bookkeeper.stream.storage.impl.sc.StorageContainerRegistryImpl - Unregistered StorageContainer ('1').
10:57:51.123 [Thread-1] INFO  org.apache.bookkeeper.stream.storage.impl.sc.StorageContainerImpl - Stopping storage container (1) ...
10:57:51.123 [Thread-1] INFO  org.apache.bookkeeper.stream.storage.impl.store.MVCCStoreFactoryImpl - Closing 000000000000000001/000000000000000001/000000000000000000 of sc 1
10:57:51.123 [io-write-scheduler-OrderedScheduler-1-0] INFO  org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal - closing async state store 000000000000000001/000000000000000001/000000000000000000
10:57:51.118 [Curator-LeaderSelector-0] ERROR org.apache.curator.framework.recipes.leader.LeaderSelector - The leader threw an exception
java.lang.InterruptedException: null
    at java.lang.Object.wait(Native Method) ~[?:1.8.0_201]
    at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_201]
    at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1411) ~[org.apache.pulsar-pulsar-zookeeper-2.4.0.jar:2.4.0]
    at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:880) ~[org.apache.pulsar-pulsar-zookeeper-2.4.0.jar:2.4.0]
    at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:274) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:268) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1]
    at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64) ~[org.apache.curator-curator-client-4.0.1.jar:?]
    at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) ~[org.apache.curator-curator-client-4.0.1.jar:?]
    at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:265) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:249) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:34) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347) ~[org.apache.curator-curator-recipes-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124) ~[org.apache.curator-curator-recipes-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154) ~[org.apache.curator-curator-recipes-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1]
    at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_201]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_201]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_201]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]

Then i tried to solve this problem with ./bin/pulsar standalone -a 127.0.0.1, unfortunately, I still encountered the same error.

However, this problem will not occur when you disconnect all connected networks (eg: wifi)

To Reproduce

This problem is not a bug that can be reliably reproduced. Some people will encounter this problem, but some people will not encounter it.

Screenshots

image

Desktop (please complete the following information):

valiantljk commented 5 years ago

I encountered the same issue and can reproduce it. Help needed. Thanks.

0x6e6562 commented 5 years ago

I can reproduce these symptoms on OSX by starting prometheus before starting pulsar.

kevenYLi commented 4 years ago

I encountered the same issue when running pulsar in docker.

mac Mojave 10.14.6 pulsar 2.4.1

https://pulsar.apache.org/docs/en/standalone-docker/ --> Get Started --> Run pulsar in Docker

aahmed-se commented 4 years ago

I am not able to see this issue as interim people can try and use the docker image and volume mount the data folder, also please check to see /etc/hosts files or any strange entries.

kevenYLi commented 4 years ago

I am not able to see this issue as interim people can try and use the docker image and volume mount the data folder, also please check to see /etc/hosts files or any strange entries.

I reproduced the issue. It happened After I pressed Ctrl+C to stop the program. I tried three times, and the exception happened three times.

To Reproduce

Environment mac Mojave 10.14.6 pulsar 2.4.1 wifi connected

aleichter commented 4 years ago

I found this thread as I was having similar exceptions. I received consistent failures while logged into wifi but seemed a bit more stable when hard connected to the network. It would still crash, however, when plugged into the network. Looking through the logs I noticed that the hostname of my mac is being used as a connection for some service in standalone mode (and I assume non-standalone mode). I found my hostname was not in DNS. I added my mac's hostname to /etc/hosts and everything started up fine and is stable.

sijie commented 4 years ago

There is a change #5856 in release 2.5.0. It will use the "localhost" if the hostname is unresolved.

Eywek commented 3 years ago

Hi, I still have this issue (macOS Mojave 10.14.6) with apachepulsar/pulsar:2.7.0 (docker 20.10.0, build 7287ab3), I've fixed it in adding my hostname to /etc/hosts.