Closed lhotari closed 6 days ago
Is this possible for us to start WebService/WebSocketService/BrokerService/ProtocolHander after the other Pulsar Services start?
Is this possible for us to start WebService/WebSocketService/BrokerService/ProtocolHander after the other Pulsar Services start?
Yes, it makes sense to cover all cases in a way so that requests aren't handled before PulsarService has been started.
Is this possible for us to start WebService/WebSocketService/BrokerService/ProtocolHander after the other Pulsar Services start?
Yes, it makes sense to cover all cases in a way so that requests aren't handled before PulsarService has been started.
@dao-jun
I renamed the concept to mean that the broker is ready to serve request instead of being fully started. I believe that this covers what is really needed and the original intention of this PR.
I checked the current solution and this is sufficiently covered already for the Pulsar broker with various configurations. There's no need to separately handle websockets since websocket servlets get added to the same Jetty server which already contains the filter to wait until PulsarService is ready for incoming requests.
ProtocolHandlers don't need any special handling since they get started as the last step in PulsarService.
@lhotari Are we able to move brokerService.start/webService.start/prototolHandler.start to the last of the method? It should be more elegant.
@lhotari Are we able to move brokerService.start/webService.start/prototolHandler.start to the last of the method? It should be more elegant.
@dao-jun Elegant? I guess you mean more simple? I don't think that it is possible. For example, brokerId contains the actual port where the server socket gets bound to. For tests we use the port 0 to bind to a dynamic port and that's only available after the server socket has been binded. It's not only the brokerId, but also the service addresses of the broker that are resolved after the server socket has been binded. The alternative would be to restructure the code to just do the binding and delay the remaining parts of the Netty server initialization to the last step so that incoming requests aren't served before the broker is ready to accept requests. That wouldn't be a simple change.
The approach in this PR is fairly simple and I think that it's a minimal change to address the problem so that requests aren't served before the broker is ready to serve them. What's not elegant? :)
The alternative would be to restructure the code to just do the binding and delay the remaining parts of the Netty server initialization to the last step so that incoming requests aren't served before the broker is ready to accept requests. That wouldn't be a simple change.
Yes, I do prefer this way, but since it's not easy to do, this PR is the most simple way.
@heesung-sn It looks like ExtensibleLoadManagerImpl accesses the broker itself before it is ready to serve requests. One possibility would be to refactor the logic to be asynchronous. I had to add a workaround for ExtensibleLoadManager in this commit: 18c97693488980ba8217fc78ef2910ad4800983b.
There's a high chance to bugs unless this is addressed in ExtensibleLoadManagerImpl. I don't think that the broker should serve any incoming requests before the broker is in certain state where it's "ready".
@heesung-sn would it be possible to make ExtensibleLoadManager initialization asynchronous so that it doesn't block the the startup sequence. with the changes in this PR (when the workaround is removed). It's org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl#start
what is the problem.
Since this PR might be blocked by other problems. There's #22981 to address #22975 alone without fixing the root cause.
@heesung-sn I pushed an attempt to make this solution work with ExtensibleLoadManagerImpl.
Attention: Patch coverage is 77.38095%
with 19 lines
in your changes missing coverage. Please review.
Project coverage is 73.39%. Comparing base (
bbc6224
) to head (fc593e7
). Report is 424 commits behind head on master.
Fixes #22975
Motivation
Pulsar broker will start serving requests while the broker is starting. This can cause issues and bugs which are hard to reproduce. Serving requests should be delayed until the broker is ready. The
brokerId
bug #22975 is just one example of problems that this PR will address.Modifications
Documentation
doc
doc-required
doc-not-needed
doc-complete