apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.23k stars 3.58k forks source link

[Bug] Brokers yield "Error Starting up in Worker" when functions are enabled #21748

Open theoribeiro opened 10 months ago

theoribeiro commented 10 months ago

Search before asking

Version

Helm: 3.0 GKE: 1.27 Pulsar: 2.10.2

Minimal reproduce step

Installing a pulsar cluster on GKE autopilot using the helm chart with mostly default settings.

What did you expect to see?

A working cluster

What did you see instead?

I'm getting an exception on my brokers initialization:

2023-12-19T00:47:53,609+0000 [main] INFO  org.apache.pulsar.functions.utils.functions.FunctionUtils - Searching for functions in /pulsar/./functions
2023-12-19T00:47:53,610+0000 [main] WARN  org.apache.pulsar.functions.utils.functions.FunctionUtils - Functions archive directory not found
2023-12-19T00:47:53,612+0000 [main] ERROR org.apache.pulsar.functions.worker.PulsarWorkerService - Error Starting up in worker
org.apache.pulsar.client.admin.PulsarAdminException: java.lang.IllegalStateException: Client instance has been closed.
    at org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:251) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:301) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.getSubscriptions(TopicsImpl.java:613) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:457) ~[org.apache.pulsar-pulsar-functions-worker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1608) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:795) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:273) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:350) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
Caused by: java.lang.IllegalStateException: Client instance has been closed.
    at org.glassfish.jersey.internal.guava.Preconditions.checkState(Preconditions.java:169) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyClient.checkNotClosed(JerseyClient.java:257) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyWebTarget.checkNotClosed(JerseyWebTarget.java:118) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyWebTarget.path(JerseyWebTarget.java:129) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyWebTarget.path(JerseyWebTarget.java:38) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.topicPath(TopicsImpl.java:1319) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.getSubscriptionsAsync(TopicsImpl.java:619) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.lambda$getSubscriptions$20(TopicsImpl.java:613) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:292) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    ... 6 more
2023-12-19T00:47:53,619+0000 [main] ERROR org.apache.pulsar.broker.PulsarService - Failed to start Pulsar service: org.apache.pulsar.client.admin.PulsarAdminException: java.lang.IllegalStateException: Client instance has been closed.
java.lang.RuntimeException: org.apache.pulsar.client.admin.PulsarAdminException: java.lang.IllegalStateException: Client instance has been closed.
    at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:585) ~[org.apache.pulsar-pulsar-functions-worker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1608) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:795) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:273) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:350) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
Caused by: org.apache.pulsar.client.admin.PulsarAdminException: java.lang.IllegalStateException: Client instance has been closed.
    at org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:251) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:301) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.getSubscriptions(TopicsImpl.java:613) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:457) ~[org.apache.pulsar-pulsar-functions-worker-2.10.2.jar:2.10.2]
    ... 4 more
Caused by: java.lang.IllegalStateException: Client instance has been closed.
    at org.glassfish.jersey.internal.guava.Preconditions.checkState(Preconditions.java:169) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyClient.checkNotClosed(JerseyClient.java:257) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyWebTarget.checkNotClosed(JerseyWebTarget.java:118) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyWebTarget.path(JerseyWebTarget.java:129) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyWebTarget.path(JerseyWebTarget.java:38) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.topicPath(TopicsImpl.java:1319) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.getSubscriptionsAsync(TopicsImpl.java:619) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.lambda$getSubscriptions$20(TopicsImpl.java:613) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:292) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.getSubscriptions(TopicsImpl.java:613) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:457) ~[org.apache.pulsar-pulsar-functions-worker-2.10.2.jar:2.10.2]
    ... 4 more
2023-12-19T00:47:53,620+0000 [pulsar-service-shutdown] INFO  org.apache.pulsar.broker.PulsarService - Closing PulsarService
2023-12-19T00:47:53,621+0000 [main] ERROR org.apache.pulsar.PulsarBrokerStarter - Failed to start pulsar service.
org.apache.pulsar.broker.PulsarServerException: java.lang.RuntimeException: org.apache.pulsar.client.admin.PulsarAdminException: java.lang.IllegalStateException: Client instance has been closed.
    at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:827) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:273) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:350) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
Caused by: java.lang.RuntimeException: org.apache.pulsar.client.admin.PulsarAdminException: java.lang.IllegalStateException: Client instance has been closed.
    at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:585) ~[org.apache.pulsar-pulsar-functions-worker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1608) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:795) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    ... 2 more
Caused by: org.apache.pulsar.client.admin.PulsarAdminException: java.lang.IllegalStateException: Client instance has been closed.
    at org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:251) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:301) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.getSubscriptions(TopicsImpl.java:613) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:457) ~[org.apache.pulsar-pulsar-functions-worker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1608) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:795) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    ... 2 more
Caused by: java.lang.IllegalStateException: Client instance has been closed.
    at org.glassfish.jersey.internal.guava.Preconditions.checkState(Preconditions.java:169) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyClient.checkNotClosed(JerseyClient.java:257) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyWebTarget.checkNotClosed(JerseyWebTarget.java:118) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyWebTarget.path(JerseyWebTarget.java:129) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.glassfish.jersey.client.JerseyWebTarget.path(JerseyWebTarget.java:38) ~[org.glassfish.jersey.core-jersey-client-2.34.jar:?]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.topicPath(TopicsImpl.java:1319) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.getSubscriptionsAsync(TopicsImpl.java:619) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.lambda$getSubscriptions$20(TopicsImpl.java:613) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:292) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.client.admin.internal.TopicsImpl.getSubscriptions(TopicsImpl.java:613) ~[org.apache.pulsar-pulsar-client-admin-original-2.10.2.jar:2.10.2]
    at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:457) ~[org.apache.pulsar-pulsar-functions-worker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1608) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:795) ~[org.apache.pulsar-pulsar-broker-2.10.2.jar:2.10.2]
    ... 2 more

Anything else?

If I disable functions using the Helm chart values, the brokers initializer correctly.

Are you willing to submit a PR?

primmayora commented 8 months ago

I face the same issue on the production environment on GKE without autopilot the autoscaling did a scale-down but it did not come back online and had same logs errors you shared, i tried to reproduce it on a controlled environment but no luck.

kerimmdjiho commented 2 months ago

I managed to resolve this by changing the limits and requests from default values in Helm chart. I increased the CPU limit from 500m to 1. Either way should be handled better.

theoribeiro commented 2 months ago

Sorry @Technoboy- I just noticed this was assigned to me but I'm afraid I won't be able to work on it. Unassigning it :(