datastax / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://www.datastax.com/products/luna-streaming
Apache License 2.0
14 stars 5 forks source link

[Bug] Inconsistent partitioned topic state after deletion (with force) #112

Open eolivelli opened 2 years ago

eolivelli commented 2 years ago

I have two problems, using 2.10_1.4: 1) I delete a partitioned topic, then the topics seems to be still there 2) a Netty o.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1

I am using 2.10_1.4, with TLS, 3 brokers. Running Pulsar shell on the bastion pod

Logs:

default(pulsar-proxy.mypulsar.svc.cluster.local)> admin topics delete-partitioned-topic -f test

default(pulsar-proxy.mypulsar.svc.cluster.local)> admin topics create-partitioned-topic -p 6 test

2022-08-19T08:23:56,032+0000 [AsyncHttpClient-25-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [https://pulsar-proxy.mypulsar.svc.cluster.local:8443/admin/v2/persistent/public/default/test/partitions?createLocalTopicOnly=false] Failed to perform http put request: javax.ws.rs.ClientErrorException: HTTP 409 Conflict
This topic already exists

Reason: This topic already exists
default(pulsar-proxy.mypulsar.svc.cluster.local)> 2022-08-19T08:23:57,113+0000 [globalEventExecutor-3-1] WARN  io.netty.util.concurrent.DefaultPromise - An exception was thrown by org.asynchttpclient.netty.channel.ChannelManager$$Lambda$519/0x00000008404fac40.operationComplete()
io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
    at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:83) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:147) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.handler.ssl.ReferenceCountedOpenSslContext.release(ReferenceCountedOpenSslContext.java:749) ~[io.netty-netty-handler-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.ReferenceCountUtil.release(ReferenceCountUtil.java:90) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.destroy(DefaultSslEngineFactory.java:79) ~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
    at org.asynchttpclient.netty.channel.ChannelManager.lambda$doClose$0(ChannelManager.java:301) ~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:184) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.channel.group.DefaultChannelGroupFuture.addListener(DefaultChannelGroupFuture.java:147) ~[io.netty-netty-transport-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.channel.group.DefaultChannelGroupFuture.addListener(DefaultChannelGroupFuture.java:41) ~[io.netty-netty-transport-4.1.77.Final.jar:4.1.77.Final]
    at org.asynchttpclient.netty.channel.ChannelManager.doClose(ChannelManager.java:301) ~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
    at org.asynchttpclient.netty.channel.ChannelManager.lambda$close$1(ChannelManager.java:308) ~[org.asynchttpclient-async-http-client-2.12.1.jar:?]
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.setSuccess(DefaultPromise.java:96) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.MultithreadEventExecutorGroup$1.operationComplete(MultithreadEventExecutorGroup.java:117) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise.access$200(DefaultPromise.java:35) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:502) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:246) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
eolivelli commented 2 years ago

Follow up: the topic is in bad state, it exists and it does not exist

default(pulsar-proxy.mypulsar.svc.cluster.local)> admin topics create-partitioned-topic -p 6 test 2022-08-19T08:26:49,174+0000 [AsyncHttpClient-31-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [https://pulsar-proxy.mypulsar.svc.cluster.local:8443/admin/v2/persistent/public/default/test/partitions?createLocalTopicOnly=false] Failed to perform http put request: javax.ws.rs.ClientErrorException: HTTP 409 Conflict This topic already exists

Reason: This topic already exists default(pulsar-proxy.mypulsar.svc.cluster.local)> admin topics delete-partitioned-topic -f test 2022-08-19T08:26:53,697+0000 [AsyncHttpClient-37-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [https://pulsar-proxy.mypulsar.svc.cluster.local:8443/admin/v2/persistent/public/default/test/partitions?force=true&deleteSchema=false] Failed to perform http delete request: javax.ws.rs.NotFoundException: HTTP 404 Not Found Partitioned topic does not exist

eolivelli commented 2 years ago

I have fixed manually recently a problem in production, an I had to manually clean up the list of partitions on ZK

dlg99 commented 2 years ago

Preliminarily, looks like an http server exception, might be something similar to https://github.com/netty/netty/issues/9882

another Illegal ref count exception in http call:


01:24:38.403 [pulsar-timer-70-1] WARN  org.apache.pulsar.client.impl.ProducerImpl - [persistent://pulsar/pulsar-gcp-uscentral1/IP_ADDR_0506d85076188c9d85a45ed41abcae99:8080/healthcheck] [pulsar-gcp-uscentral1-210-33761] Got exception while completing the callback for msg -1:

io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1

at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:83) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]

at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:147) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]

at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]

at org.apache.pulsar.client.impl.ProducerImpl$1.sendComplete(ProducerImpl.java:343) ~[com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:

at org.apache.pulsar.client.impl.ProducerImpl$OpSendMsg.sendComplete(ProducerImpl.java:1235) ~[com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:

at org.apache.pulsar.client.impl.ProducerImpl.lambda$failPendingMessages$18(ProducerImpl.java:1753) ~[com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:

at java.util.ArrayDeque.forEach(ArrayDeque.java:889) [?:?]

at org.apache.pulsar.client.impl.ProducerImpl$OpSendMsgQueue.forEach(ProducerImpl.java:1313) [com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:

at org.apache.pulsar.client.impl.ProducerImpl.failPendingMessages(ProducerImpl.java:1743) [com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:

at org.apache.pulsar.client.impl.ProducerImpl.run(ProducerImpl.java:1721) [com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:

at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]

at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]

at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]

at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]

at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]

at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]

at java.lang.Thread.run(Thread.java:829) [?:?]
github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.