apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.23k stars 3.58k forks source link

[Bug] The transaction message can not be used normally #20851

Open g0715158 opened 1 year ago

g0715158 commented 1 year ago

Search before asking

Version

2.11.2

Minimal reproduce step

transactionCoordinatorEnabled=true systemTopicEnabled=true acknowledgmentAtBatchIndexLevelEnabled=true

image

What did you expect to see?

The transaction message function can be used normally

2023-07-21 17:40:57,545 - ERROR - [pulsar-client-io-1-1:TransactionBuilderImpl@66] - New transaction error. org.apache.pulsar.client.api.transaction.TransactionCoordinatorClientException$CoordinatorNotFoundException: Transaction coordinator with id 1 not found! at org.apache.pulsar.client.impl.TransactionMetaStoreHandler.getExceptionByServerError(TransactionMetaStoreHandler.java:368) at org.apache.pulsar.client.impl.TransactionMetaStoreHandler.handleNewTxnResponse(TransactionMetaStoreHandler.java:161) at org.apache.pulsar.client.impl.ClientCnx.handleNewTxnResponse(ClientCnx.java:924) at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:371) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at org.apache.pulsar.shade.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at org.apache.pulsar.shade.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834) Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.transaction.TransactionCoordinatorClientException$CoordinatorNotFoundException: Transaction coordinator with id 1 not found! at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) at com.newland.pulsar.producer.transaction.TransactionProducer.main(TransactionProducer.java:16) Caused by: org.apache.pulsar.client.api.transaction.TransactionCoordinatorClientException$CoordinatorNotFoundException: Transaction coordinator with id 1 not found! at org.apache.pulsar.client.impl.TransactionMetaStoreHandler.getExceptionByServerError(TransactionMetaStoreHandler.java:368) at org.apache.pulsar.client.impl.TransactionMetaStoreHandler.handleNewTxnResponse(TransactionMetaStoreHandler.java:161) at org.apache.pulsar.client.impl.ClientCnx.handleNewTxnResponse(ClientCnx.java:924) at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:371) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at org.apache.pulsar.shade.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at org.apache.pulsar.shade.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834)

What did you see instead?

After the broker.conf transaction is started, the coordinator cannot be initialized and the topic cannot survive the system topic

Anything else?

No response

Are you willing to submit a PR?

liangyepianzhou commented 1 year ago

Are you using standalone or cluster? For standalone, you should use the following command to init transaction metadata. See details at https://pulsar.apache.org/docs/3.0.x/txn-use/

bin/pulsar initialize-transaction-coordinator-metadata -cs 127.0.0.1:2181 -c standalone

For cluster, you can use the following command to init the cluster metadata.

        bin/pulsar initialize-cluster-metadata \
          --cluster cluster-a \
          --zookeeper 127.0.0.1:2181 \
          --configuration-store 127.0.0.1:2181 \
          --web-service-url http://127.0.0.1:8080 \
          --broker-service-url pulsar://127.0.0.1:6650 \
          --web-service-url-tls https://127.0.0.1:8443 \
          --broker-service-url-tls pulsar+ssl://127.0.0.1:6651\
          --initial-num-transaction-coordinators 20
g0715158 commented 1 year ago

Are you using standalone or cluster? For standalone, you should use the following command to init transaction metadata. See details at https://pulsar.apache.org/docs/3.0.x/txn-use/

bin/pulsar initialize-transaction-coordinator-metadata -cs 127.0.0.1:2181 -c standalone

For cluster, you can use the following command to init the cluster metadata.

        bin/pulsar initialize-cluster-metadata \
          --cluster cluster-a \
          --zookeeper 127.0.0.1:2181 \
          --configuration-store 127.0.0.1:2181 \
          --web-service-url http://127.0.0.1:8080 \
          --broker-service-url pulsar://127.0.0.1:6650 \
          --web-service-url-tls https://127.0.0.1:8443 \
          --broker-service-url-tls pulsar+ssl://127.0.0.1:6651\
          --initial-num-transaction-coordinators 20

@liangyepianzhou I just followed the cluster shell script you provided, but it still can't be used normally, thanks

g0715158 commented 1 year ago

Are you using standalone or cluster? For standalone, you should use the following command to init transaction metadata. See details at https://pulsar.apache.org/docs/3.0.x/txn-use/

bin/pulsar initialize-transaction-coordinator-metadata -cs 127.0.0.1:2181 -c standalone

For cluster, you can use the following command to init the cluster metadata.

        bin/pulsar initialize-cluster-metadata \
          --cluster cluster-a \
          --zookeeper 127.0.0.1:2181 \
          --configuration-store 127.0.0.1:2181 \
          --web-service-url http://127.0.0.1:8080 \
          --broker-service-url pulsar://127.0.0.1:6650 \
          --web-service-url-tls https://127.0.0.1:8443 \
          --broker-service-url-tls pulsar+ssl://127.0.0.1:6651\
          --initial-num-transaction-coordinators 20

image @liangyepianzhou

liangyepianzhou commented 1 year ago

Are you using standalone or cluster? For standalone, you should use the following command to init transaction metadata. See details at https://pulsar.apache.org/docs/3.0.x/txn-use/

bin/pulsar initialize-transaction-coordinator-metadata -cs 127.0.0.1:2181 -c standalone

For cluster, you can use the following command to init the cluster metadata.

        bin/pulsar initialize-cluster-metadata \
          --cluster cluster-a \
          --zookeeper 127.0.0.1:2181 \
          --configuration-store 127.0.0.1:2181 \
          --web-service-url http://127.0.0.1:8080 \
          --broker-service-url pulsar://127.0.0.1:6650 \
          --web-service-url-tls https://127.0.0.1:8443 \
          --broker-service-url-tls pulsar+ssl://127.0.0.1:6651\
          --initial-num-transaction-coordinators 20

image @liangyepianzhou

Did you rerun the test?

g0715158 commented 1 year ago

Did you rerun the test?

Yeah, I built a new cluster for testing

@liangyepianzhou

github-actions[bot] commented 1 year ago

The issue had no activity for 30 days, mark with Stale label.

Genuineh commented 1 year ago

I have same bug on cluster mode

Genuineh commented 1 year ago

@liangyepianzhou @Technoboy- the issue fixed in 3.1.1? if not , how to use transaction features in 3.1.1?