Open aschiazza opened 4 years ago
Are you able to take a dump of the stacktrace of the JVM? With jstack for instance
Thanks for the reply.
I have attached the JVM stacktrace and ps output.
It looks like a problem with some leader electiin. I can't tell more now (I don't have my laptop) Is the bookie able to talk to zookeeper?
I think bookie is able to talk to zookeeper. I test it in that way:
1- I attached a shell in bookkeeper container.
2 - started a zookeeper shell (bin/pulsar zookeeper-shell -server zookeeper:2181
)
3 - in zk shell I typed ls /stream/servers/available
4 - result is [172.25.0.6:4181]
( that is the IP of bookie container)
Can I test in some other way?
Any suggestions?
@aschiazza do you have the log of the bookies?
Hi, thanks for the reply. I attached the log of the bookie. bookkeeper_log.txt
@aschiazza Hi,I started StreamStorageLifecycleComponent according to your configuration, but it is 127.0.0.1:4181
in zk,I tried many methods without success. Is there any other configuration?
my pulsar version is 2.5.0
.
Hi @zyllt, I've attached my bookkeeper conf file.
bookkeeper.conf.txt
I'm still using version 2.4.1
for my project.
in functions_worker.yml
conf file I've added this line stateStorageServiceUrl: bk://bookkeeper:4181
I'm using docker-compose for my environment and the command for bookie container is /pulsar/bin/bookkeeper bookie
I hope it will help you.
@aschiazza Thanks for your reply. I start bookie bin/bookkeeper bookie
use your bookkeeper conf file without docker ,i type ls /stream/servers/available
at zk
,it is still 127.0.0.1:4181
.
but when i use docker-compose for my environment and the cmd is bin/pulsar standalone
,the output of bin/pulsar zookkeeper-shell && ls /stream/servers/available
is [172.17.0.2:4181]
. that is the IP of bookie container.
I am very confused but I can't find the source code to get ip when registering zk.
@sijie Can you give me some suggestions?
If you start pulsar in standalone mode all components (broker, bookie zk) start in a single container and registered to the loopback interface.
You should start each component (broker, bookie, zk) with proper command, for example
/pulsar/bin/pulsar zookeeper
for zk
/pulsar/bin/bookkeeper bookie
for bookie
/pulsar/bin/pulsar broker
for pulsar broker
/pulsar/bin/pulsar proxy
for pulsar proxy
/pulsar/bin/pulsar functions-worker
for functions worker
If you start pulsar in standalone mode all components (broker, bookie zk) start in a single container and registered to the loopback interface. You should start each component (broker, bookie, zk) with proper command, for example
/pulsar/bin/pulsar zookeeper
for zk/pulsar/bin/bookkeeper bookie
for bookie/pulsar/bin/pulsar broker
for pulsar broker/pulsar/bin/pulsar proxy
for pulsar proxy/pulsar/bin/pulsar functions-worker
for functions worker
@aschiazza thanks for your reply.May be my previous expression was inaccurate. I really mean when I start bookie bin/bookkeeper bookie
use your bookkeeper conf file without docker,my environment is product.
because my product environment do not support docker,i start bookie
with docker in my local environment.
First I test bin/pulsar standalone
command with docker,the output of bin/pulsar zookkeeper-shell && ls /stream/servers/available
is [172.17.0.2:4181]
.
Second i start each component (broker, bookie, zk) use bin/pulsar zookeeper
and bin/bookkeeper bookie
command with docker, the input is [172.17.0.4:4181].
Last i test each component (broker, bookie, zk) in my local environment without docker,but input is [127.0.0.1:4181]
.
172.17.0.2
and 172.17.0.4
are the IP of bookiecontainer.
@zyllt in your local environment how do you start components? with which command?
if in you start different components on same machine (without docker), the ip address for each components is the same (more specifically all ip addresses set to machine interfaces or 127.0.0.1
)
@aschiazza hi,thanks in advance for your reply,below I will describe my problem and test steps in detail.
First I followed the steps below when I started pulsar in my local environment(without docker),
1.command bin/pulsar-daemon start zookeeper
for zk
2.bin/pulsar-daemon start bookie
for bookkeeper(I use your bookie conf In addition to the zkServers
)
3.bin/pulsar-daemon start broker
for broker(I start function-worker with broker by functionsWorkerEnabled=true
)
I typed bin/pulsar zookeeper-shell && ls /stream/servers/available
,the result is [127.0.0.1:4181]
.
I started demo function WordCountFunction
and typed bin/pulsar-admin functions trigger --fqfn test/test-namespace/WordCountFunction --trigger-value "hello pulsar hello wolrd"
,
then i got successful result when i use bin/pulsar-admin functions querystate --fqfn test/test-namespace/WordCountFunction --key hello
.
Second i test use docker-compose in my local environment.
I started each components use docker.I got [172.17.0.3:4181]
by use ls /stream/servers/available
.
then i started demo function WordCountFunction
and trigger it,the result is success.
Third i test in my product environment without docker.
I followed the First step exactly to started the product environment.The difference is that broker and bookie are on different machines.
I typed bin/pulsar zookeeper-shell && ls /stream/servers/available
,the result still is [127.0.0.1:4181]
.
then i started demo function WordCountFunction
,i find this function did not start successfully,the log shows that the startup process is parked at line org.apache.bookkeeper.clients.impl.channel.StorageServerChannelManager - Added range server (hostname: "127.0.0.1" port: 4181 ) into the channel manager
.
I suspect that function
and StreamStorageServer
cannot establish a connection use (hostname: "127.0.0.1" port: 4181 )
,because the function
and StreamStorageServer
are not on the same machine.
I must state I had set stateStorageServiceUrl: bk://10.1.0.112:4181
in config file functions_worker.yml
.That 10.1.0.112
is the IP of my bookie machine. And I see through the netstat -ant|grep 4181
command that the function machine
and bookie machine
have established a connection.
I think the hostname:127.0.0.1
should be obtained in zk,but i can't be sure.
What I can confirm so far is that this 127.0.01:4181
in zk is definitely incorrect.because When I started another bookie machine, StreamStorageServer
reminded me that it was already registered.
So I wonder how to register the real server IP, not the 127.0.0.1
, in zookeeper?
Any ideas?
Here is the detailed function startup log
19:27:11.455 [test/test-namespace/WordCountFunction-0] INFO org.apache.pulsar.functions.instance.JavaInstanceRunnable - Starting Java Instance WordCountFunction :
Details = tenant: "test"
namespace: "test-namespace"
name: "WordCountFunction"
className: "org.apache.pulsar.functions.api.examples.WordCountFunction"
userConfig: "{\"PublishTopic\":\"test_result\"}"
autoAck: true
parallelism: 1
source {
typeClassName: "java.lang.String"
inputSpecs {
key: "test/test-namespace/test_src"
value {
}
}
cleanupSubscription: true
}
sink {
topic: "test/test-namespace/test_result"
typeClassName: "java.lang.Void"
}
resources {
cpu: 1.0
ram: 1073741824
disk: 10737418240
}
componentType: FUNCTION
19:27:11.455 [test/test-namespace/WordCountFunction-0] INFO org.apache.pulsar.functions.instance.JavaInstanceRunnable - Load JAR: /usr/local/pulsar-2.5.0/download/pulsar_functions/test/test-namespace/WordCountFunction/0/pulsar-functions-api-examples.jar
19:27:11.467 [test/test-namespace/WordCountFunction-0] INFO org.apache.pulsar.functions.instance.JavaInstanceRunnable - Initialize function class loader for function WordCountFunction at function cache manager
19:27:11.920 [client-scheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.clients.impl.channel.StorageServerChannelManager - Added range server (hostname: "127.0.0.1"
port: 4181
) into the channel manager.
Hi @zyllt can you post or attach your bookkeeper configuration file?
hi @aschiazza I've attached my bookkeeper conf file.
@zyllt How many bookie nodes do you have? I think I read only one. With these parameters:
dlog.bkcEnsembleSize=3 dlog.bkcWriteQuorumSize=2 dlog.bkcAckQuorumSize=2
you require a cluster ensamble composed by 3 bookie nodes (where segments are spread), and you require an ack quorum of 2. If you have only one bookie node change these values to 1.
Another suggestion: Check bookkeeper logs. When it is starting up in logs you should be able to see all configuration parameters read from conf file.
I've attached a log example bookkeeper.logs.txt
Hi, I've a trouble with Bookkeeper v4.9.2 ( I'm using Pulsar Docker image v2.4.1). When I activate the StreamStorageLifecycleComponent adding this line
in bookkeeper.conf, the bookkeeper container is using 100% CPU.
In my test environment I have 1 Zookeeper server and 1 Bookkeeper bookie.
I report my bookkeeper.conf configuration section for table service
Thanks for your help