Permanent namespace corruption after attempting to delete a namespace shortly after it was created and after all contained topics were deleted; broker log of "MetadataNotFoundException: Managed ledger not found"

zbentley commented 3 years ago

Describe the bug If I create a namespace, put a persistent, partitioned topic in it, and then immediately thereafter delete the topic and attempt to delete the namespace, namespace deletion sometimes fails with a 500 error that corresponds to "MetadataNotFoundException: Managed ledger not found" in the broker's log.

Note that this error condition is persistent; if it occurs once it keeps occurring; the namespace can never be deleted.

Note that I was connecting to this broker through a Pulsar proxy; my client is programmed to raise an exception on 307 redirect codes, so the 307 response served by the broker in the log snippet below must have been handled by the proxy, not my client.

To Reproduce Run reproduction plan for https://github.com/apache/pulsar/issues/12551; sometimes no error will occur, sometimes the error described in that issue or others will occur, and sometimes this error will occur.

Expected behavior

Namespace deletion should succeed.
This is probably more important: all operations related to deletion/creation of topics/namespaces/tenants/subscriptions conclude before the management API returns success codes to the user. This bug and the other similar ones I filed (see github links below) all seem to arise from CRUD operations with the management API being asynchronous: i.e. when I create/delete a tenant/topic/namespace, the actual side effects of that creation or deletion (e.g. adding/removing ledgers in BookKeeper, updating metadata in ZK) occur later, not during the API post. Not only is that bound to cause bugs like this, but it's also not what users expect; I would be happy to wait seconds or minutes for management API operations to complete in exchange for knowing that when they successfully complete that the thing I requested has actually been done.

Environment Same environment as https://github.com/apache/pulsar/issues/12551

What my client sees

chariot.pulsar.client.exceptions.PulsarAPIError: Server error '500 Server Error' for url 'http://pulsar-blt-sn-platform-proxy-headless.pulsar:8080/admin/v2/persistent/blt4/chariot_ns_test/chariot_topic_test/partitions?force=true'
Response code: 500
URL: DELETE http://pulsar-blt-sn-platform-proxy-headless.pulsar:8080/admin/v2/persistent/blt4/chariot_ns_test/chariot_topic_test/partitions?force=true
Headers: Headers({'host': 'pulsar-blt-sn-platform-proxy-headless.pulsar:8080', 'accept': '*/*', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'user-agent': 'python-httpx/0.20.0'})
Payload: b''

Broker Stacktrace

[31/Oct/2021:16:23:15 +0000] "GET /admin/v2/persistent/blt4/chariot_ns_test HTTP/1.1" 200 68 "-" "python-httpx/0.20.0" 2
[31/Oct/2021:16:23:15 +0000] "DELETE /admin/v2/persistent/blt4/chariot_ns_test/chariot_topic_test?force=true HTTP/1.1" 307 0 "-" "python-httpx/0.20.0" 0
broker-0 16:23:15.813 [pulsar-web-41-5] INFO  org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Topic was already not existing persistent://blt4/chariot_ns_test/chariot_topic_test
java.util.concurrent.ExecutionException: org.apache.bookkeeper.mledger.ManagedLedgerException$MetadataNotFoundException: Managed ledger not found
    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?]
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
    at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalDeleteTopicForcefully(PersistentTopicsBase.java:304) ~[io.streamnative-pulsar-broker-2.8.1.5.jar:2.8.1.5]
    at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalDeleteTopic(PersistentTopicsBase.java:946) ~[io.streamnative-pulsar-broker-2.8.1.5.jar:2.8.1.5]
    at org.apache.pulsar.broker.admin.v2.PersistentTopics.deleteTopic(PersistentTopics.java:889) ~[io.streamnative-pulsar-broker-2.8.1.5.jar:2.8.1.5]
    at jdk.internal.reflect.GeneratedMethodAccessor459.invoke(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159) ~[org.glassfish.jersey.core-jersey-server-2.34.ja
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
    at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
    at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
    at org.glassfish.jersey.internal.Errors.process(Errors.java:244) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.apache.pulsar.broker.web.ResponseHandlerFilter.doFilter(ResponseHandlerFilter.java:67) ~[io.streamnative-pulsar-broker-2.8.1.5.jar:2.8.1.5]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.apache.pulsar.broker.intercept.BrokerInterceptor.onFilter(BrokerInterceptor.java:88) ~[io.streamnative-pulsar-broker-2.8.1.5.jar:2.8.1.5]
    at org.apache.pulsar.broker.web.ProcessHandlerFilter.doFilter(ProcessHandlerFilter.java:49) ~[io.streamnative-pulsar-broker-2.8.1.5.jar:2.8.1.5]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.apache.pulsar.broker.web.PreInterceptFilter.doFilter(PreInterceptFilter.java:70) ~[io.streamnative-pulsar-broker-2.8.1.5.jar:2.8.1.5]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
 at org.apache.pulsar.broker.web.PreInterceptFilter.doFilter(PreInterceptFilter.java:70) ~[io.streamnative-pulsar-broker-2.8.1.5.jar:2.8.1.5]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:179) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) [org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) [org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [org.eclipse.jetty-jetty-io-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [org.eclipse.jetty-jetty-io-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [org.eclipse.jetty-jetty-io-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
 broker-1 16:23:15.820 [pulsar-web-41-4] INFO  org.eclipse.jetty.server.RequestLog - 10.0.234.253 - - [31/Oct/2021:16:23:15 +0000] "DELETE /admin/v2/namespaces/blt4/chariot_ns_test HTTP/1.1" 409 46 "-" "python-httpx/0.2
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.68.Final.jar:4.1.68.Final]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException$MetadataNotFoundException: Managed ledger not found
[31/Oct/2021:16:23:15 +0000] "DELETE /admin/v2/persistent/blt4/chariot_ns_test/chariot_topic_test?authoritative=true&force=true HTTP/1.1" 204 0 "-" "Jetty/9.4.43.v20210629"

zbentley commented 3 years ago

The following issues were all observed in response to similar testing: https://github.com/apache/pulsar-client-cpp/issues/86 https://github.com/apache/pulsar/issues/12556 https://github.com/apache/pulsar/issues/12555 https://github.com/apache/pulsar/issues/12554 https://github.com/apache/pulsar/issues/12553 https://github.com/apache/pulsar/issues/12552 https://github.com/apache/pulsar/issues/12551

The condition that caused these issues to occur appears to be interaction with various pulsar entities (e.g. creating/deleting things in the management API, or attempting to create consumers) immediately after those entities were created or immediately after entities with the same name were deleted.

I think the number of issues observed speaks to a defect in the management API functionality in general. Considering the severity of these issues (in many cases it is possible to force a topic/namespace into a permanently corrupted state), I hope a resolution can be found for the general/common root cause rather than fixing individual bug-inducing conditions.

I suspect that the common root cause is that many management API operations are asynchronous that should not be.

Ideally, the resolution of all of these issues would be the same: a management API operation--any operation--should not return successfully until all observable side effects of that operation across a Pulsar cluster (including brokers, proxies, bookies, and ZK) were completed. All caches of metadata (e.g. on all brokers/proxies in the cluster) related to the operation should be cleared, and all persistent state (including ledger deletion, bookie cleanup, ZooKeeper metadata, etc.) should be updated during management API operations, and not afterwards.

If that means that management API operations take many seconds or minutes, that's still vastly preferable to not knowing when it is safe to interact with a cluster again after performing "DDL"-type changes.

merlimat commented 3 years ago

The condition that caused these issues to occur appears to be interaction with various pulsar entities (e.g. creating/deleting things in the management API, or attempting to create consumers) immediately after those entities were created or immediately after entities with the same name were deleted.

I think the number of issues observed speaks to a defect in the management API functionality in general. Considering the severity of these issues (in many cases it is possible to force a topic/namespace into a permanently corrupted state), I hope a resolution can be found for the general/common root cause rather than fixing individual bug-inducing conditions.

I suspect that the common root cause is that many management API operations are asynchronous that should not be.

All operations are returned only after a successful write to metadata service.

If that induces a permanent inconsistency state, it's a bug that absolutely needs to get fixed.

Ideally, the resolution of all of these issues would be the same: a management API operation--any operation--should not return successfully until all observable side effects of that operation across a Pulsar cluster (including brokers, proxies, bookies, and ZK) were completed. All caches of metadata (e.g. on all brokers/proxies in the cluster) related to the operation should be cleared, and all persistent state (including ledger deletion, bookie cleanup, ZooKeeper metadata, etc.) should be updated during management API operations, and not afterwards.

If that means that management API operations take many seconds or minutes, that's still vastly preferable to not knowing when it is safe to interact with a cluster again after performing "DDL"-type changes.

In reality, this is not an easy thing to do in any distributed system, barring of completely eliminating the caches.

In a distributed system, there is no way to 100% ensure all caches have been invalidated.

zbentley commented 3 years ago

If that induces a permanent inconsistency state, it's a bug that absolutely needs to get fixed.

This specific issue is a permanent inconsistency. The others usually go away after waiting a few minutes.

In reality, this is not an easy thing to do in any distributed system, barring of completely eliminating the caches.

Is there some way instead to bypass cached metadata on a per-operation basis? Could that be exposed as a parameter for management API operations?

Alternatively, would it be possible to add behavior like "when the backend encounters an error that could be due to an out-of-date cache, dump the cache and try once more against authoritative state"?

I ask because the current workaround (add what amounts to large numbers of sleeps and retries around common management API and create-producer/subscribe interactions) is pretty costly both in runtime and complexity.

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.

eolivelli commented 2 years ago

In a system that is not under heavy load on metadata the local caches should be updated in a timely manner. If the system does not converge to a consistent view over metadata then we have some bugs. We recently found a problem on the usage of Caffeine in the local metadata cache.

The alternative is to switch to a distributed local cache that guarantees causal consistency for the near caches, that is, when I write to the cache and the write returns I am sure that every local copy in each near cache is invalidated (or updated).

Some years ago I worked on this project to solve this problem https://github.com/diennea/blazingcache It implements a very lightweight cache with near caching support and strong guarantees about causal consistency. Because we really needed to have such guarantees of distributed cache invalidation.

@lhotari @michaeljmarshall @merlimat @codelipenghui

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.

relief-melone commented 1 year ago

can confirm that this issue seems to persist in pulsar 3.0.0

apache / pulsar

Permanent namespace corruption after attempting to delete a namespace shortly after it was created and after all contained topics were deleted; broker log of "MetadataNotFoundException: Managed ledger not found" #12556