Closed rob-kvietkauskas-ign closed 1 year ago
I have deployed a new cluster of 6 nodes (3 broker + bookkeeper machines and 3 zookeeper machines), running Pulsar 2.11.1. Using pretty much same configuration, I encounter same error: Error on allocating ledger under /pulsar/functions/ ...
Yesterday I have spent some time analizing 2.11.1 Pulsar source code, trying to figure out what may be the cause of zookeper connection to localhost
(even though, nor broker, nor bookie, nor functions worker configuration does not contain any entries related to localhost
). After digging the code that is mentioned in the stacktraces I've provided and stacktraces I get using Pulsar 2.11.1, I have a strong feeling that problem is located in uploadToBookKeeper(Namespace, InputStream, String)
method call in WorkerUtils class. After this method gets called, BookKeeper client tries to access a bookkeeper via ZooKeeper, but somehow (not sure how exactly) ZooKeeperClient gets initialized with localhost value.
I have managed to register both source (both file and Debezium PostgreSQL Source) and (custom) sink connectors in 2.11.1 cluster. Stream storage service was disabled (as it is by default). I have set functionsWorkerEnablePackageManagement=true
and enablePackagesManagement=true
, restarted brokers and finally was able to run the connectors! I tried to do the same on 2.9.1, but result did not change at all – I still got ledger allocation errors.
I have finally managed to register both source (both file and Debezium PostgreSQL Source) and (custom) sink connectors in 2.9.1 bare-metal cluster. Once I added file://
schema prefix to path of the archive (I have provided absolute or relative file system paths of the archives until this discovery, same way as it is described in documentation), all the connectors that I was testing, got created (registered) successfully, operated properly after restart of broker node. It was not the case in 2.11.1 cluster with functionsWorkerEnablePackageManagement
enabled – after restart of the broker, connector didn't start automatically and ended up having underlying internal error).
Once I added file:// schema prefix to path of the archive
Could you elaborate a bit what archive
you configured? IIRC we use Path
type for archive path and it should not take file://
input
It was not the case in 2.11.1
This seems a new issue. cc @jiangpengcheng @cbornet as you work on functions related logics recently. Do you have any thoughts on this kind of exceptions?
Seems not a bug but usability to be improved.
No specific action and duplicate with the open-ended discussion https://github.com/apache/pulsar/discussions/20796.
Let's move the discussion there and spawn any concrete issue to improve.
Search before asking
Version
OS: Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux Pulsar: 2.9.1 / 2.11.1 Pulsar Built-in File Source Connector: 2.9.1 Pulsar Built-in Debezium PostgreSQL Source Connector: 2.9.1 Java: openjdk 11.0.18 2023-01-17
Minimal reproduce step
Steps described below were performed using Ansible in order to perform identical configuration on machines of same type. This was done in order to reduce human errors.
<pulsar directory>/connectors
and<pulsar directory>/connectors/configuration
respectively);After setup is done, navigate to
<pulsar directory>/bin
directory and perform: ./pulsar-admin sources create --archive /opt/pulsar/apache-pulsar-2.9.1/connectors/pulsar-io-file-2.9.1.nar --name file-test --destination-topic-name pulsar-file-test --source-config-file /opt/pulsar/apache-pulsar-2.9.1/connectors/configuration/file-connector.ymlIn case of Built-in File Source connector, configuration of it is:
Broker machine has a.txt in
/opt/pulsar
directory (readable, permissions and ownership verified)To clarify:
bookkeeper.conf
haszkServers=<list_of_actual_addresses_of_zookeper_instances>
;broker.conf
haszookeeperServers=<list_of_actual_addresses_of_zookeper_instances>
;broker.conf
hasconfigurationStoreServers=<list_of_actual_addresses_of_zookeper_instances>
;broker.conf
hasglobalZookeeperServers=<list_of_actual_addresses_of_zookeper_instances>
;functions_woker.yml
hasconfigurationStoreServers
commented-out;The cluster works fine, without any errors while
StateStorageLifecycleComponent
is not enabled inbookkeeper.conf
on servers containing Bookkeeper. After functions worker setup is done, connectors do work inlocalrun
mode (this proves that configuration of connector is valid, both for File Source and Debezium PostgreSQL Source). AfterStateStorageLifecycleComponent
gets enabled, following piece reoccurs in logs (ifinitializedDlogMetadata: false
– in logs of Broker, ifinitializedDlogMetadata: true
– in logs of Bookkeeper):What is more, this fragment reoccurs without connector-related pulsar-admin calls.
What did you expect to see?
Connector running (succesfully created and started).
What did you see instead?
Error on allocating ledger under /pulsar/functions/public/default//78b1ef8d-ab3c-4deb-8b60-0579ff0ba395-.nar//allocation
Reason: HTTP 500 Internal Server Error
Anything else?
Also, after
StateStorageLifecycleComponent
gets enabled (andinitializedDlogMetadata
is set tofalse
), following piece (stream paths change) reoccurs periodically in broker logs:If
StateStorageLifecycleComponent
gets enabled (andinitializedDlogMetadata
is set tofalse
), following piece (stream paths change) occurs in logs whenpulsar-admin sources create
is called:Are you willing to submit a PR?