Open eolivelli opened 2 years ago
Follow up: the topic is in bad state, it exists and it does not exist
default(pulsar-proxy.mypulsar.svc.cluster.local)> admin topics create-partitioned-topic -p 6 test
2022-08-19T08:26:49,174+0000 [AsyncHttpClient-31-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [https://pulsar-proxy.mypulsar.svc.cluster.local:8443/admin/v2/persistent/public/default/test/partitions?createLocalTopicOnly=false] Failed to perform http put request: javax.ws.rs.ClientErrorException: HTTP 409 Conflict
This topic already exists
Reason: This topic already exists
default(pulsar-proxy.mypulsar.svc.cluster.local)> admin topics delete-partitioned-topic -f test
2022-08-19T08:26:53,697+0000 [AsyncHttpClient-37-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [https://pulsar-proxy.mypulsar.svc.cluster.local:8443/admin/v2/persistent/public/default/test/partitions?force=true&deleteSchema=false] Failed to perform http delete request: javax.ws.rs.NotFoundException: HTTP 404 Not Found
Partitioned topic does not exist
I have fixed manually recently a problem in production, an I had to manually clean up the list of partitions on ZK
Preliminarily, looks like an http server exception, might be something similar to https://github.com/netty/netty/issues/9882
another Illegal ref count exception in http call:
01:24:38.403 [pulsar-timer-70-1] WARN org.apache.pulsar.client.impl.ProducerImpl - [persistent://pulsar/pulsar-gcp-uscentral1/IP_ADDR_0506d85076188c9d85a45ed41abcae99:8080/healthcheck] [pulsar-gcp-uscentral1-210-33761] Got exception while completing the callback for msg -1:
io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:83) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:147) ~[io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101) ~[io.netty-netty-buffer-4.1.77.Final.jar:4.1.77.Final]
at org.apache.pulsar.client.impl.ProducerImpl$1.sendComplete(ProducerImpl.java:343) ~[com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:
at org.apache.pulsar.client.impl.ProducerImpl$OpSendMsg.sendComplete(ProducerImpl.java:1235) ~[com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:
at org.apache.pulsar.client.impl.ProducerImpl.lambda$failPendingMessages$18(ProducerImpl.java:1753) ~[com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:
at java.util.ArrayDeque.forEach(ArrayDeque.java:889) [?:?]
at org.apache.pulsar.client.impl.ProducerImpl$OpSendMsgQueue.forEach(ProducerImpl.java:1313) [com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:
at org.apache.pulsar.client.impl.ProducerImpl.failPendingMessages(ProducerImpl.java:1743) [com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:
at org.apache.pulsar.client.impl.ProducerImpl.run(ProducerImpl.java:1721) [com.datastax.oss-pulsar-client-original-IP_ADDR_879267a7a959ae992b7ab08a0eb69b8a.0.12.jar:
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
at java.lang.Thread.run(Thread.java:829) [?:?]
The issue had no activity for 30 days, mark with Stale label.
I have two problems, using 2.10_1.4: 1) I delete a partitioned topic, then the topics seems to be still there 2) a Netty o.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
I am using 2.10_1.4, with TLS, 3 brokers. Running Pulsar shell on the bastion pod
Logs:
default(pulsar-proxy.mypulsar.svc.cluster.local)>
admin topics delete-partitioned-topic -f test
default(pulsar-proxy.mypulsar.svc.cluster.local)>
admin topics create-partitioned-topic -p 6 test