apache / incubator-seata

:fire: Seata is an easy-to-use, high-performance, open source distributed transaction solution.
https://seata.apache.org/
Apache License 2.0
25.37k stars 8.79k forks source link

Raft 模式下 RM 开启事务空指针异常 #7021

Closed EdaZhang closed 4 days ago

EdaZhang commented 5 days ago

Ⅰ. Issue Description

Raft 模式下 RM 开启事务空指针异常,请问是我的配置有什么问题吗?

Ⅱ. Describe what happened

SeataClusterContext.getGroup() 计算结果为null,导致 rm 开启事务失败。

image

17:23:59.808  INFO --- [     batchLoggerPrint_1_1] [ocessor.server.BatchLogHandler] [                 run]  [] : receive msg[single]: GlobalBeginRequest{transactionName='create(com.z.order.domain.entity.Order)', timeout=60000}, clientIp: 10.x.x.x, vgroup: default_tx_group
17:24:01.362 ERROR --- [rverHandlerThread_1_2_500] [ption.AbstractExceptionHandler] [eptionHandleTemplate]  [10.199.101.21:8091:9214953266510188545] : Catch RuntimeException while do RPC, request: GlobalBeginRequest{transactionName='create(com.jlpay.order.domain.entity.Order)', timeout=60000}
==>
java.lang.NullPointerException: null
    at org.apache.seata.server.cluster.raft.util.RaftTaskUtil.createTask(RaftTaskUtil.java:51)
    at org.apache.seata.server.storage.raft.session.RaftSessionManager.onBegin(RaftSessionManager.java:93)
    at org.apache.seata.server.session.GlobalSession.begin(GlobalSession.java:222)
    at org.apache.seata.server.coordinator.DefaultCore.begin(DefaultCore.java:139)
    at org.apache.seata.server.coordinator.DefaultCoordinator.doGlobalBegin(DefaultCoordinator.java:276)
    at org.apache.seata.server.AbstractTCInboundHandler$1.execute(AbstractTCInboundHandler.java:64)
    at org.apache.seata.server.AbstractTCInboundHandler$1.execute(AbstractTCInboundHandler.java:60)
    at org.apache.seata.core.exception.AbstractExceptionHandler.exceptionHandleTemplate(AbstractExceptionHandler.java:127)
    at org.apache.seata.server.AbstractTCInboundHandler.handle(AbstractTCInboundHandler.java:60)
    at org.apache.seata.core.protocol.transaction.GlobalBeginRequest.handle(GlobalBeginRequest.java:76)
    at org.apache.seata.server.coordinator.DefaultCoordinator.onRequest(DefaultCoordinator.java:642)
    at org.apache.seata.core.rpc.processor.server.ServerOnRequestProcessor.onRequestMessage(ServerOnRequestProcessor.java:206)
    at org.apache.seata.core.rpc.processor.server.ServerOnRequestProcessor.process(ServerOnRequestProcessor.java:122)
    at org.apache.seata.core.rpc.netty.AbstractNettyRemoting.lambda$processMessage$2(AbstractNettyRemoting.java:280)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:750)
<==

Ⅲ. Describe what you expected to happen

RM 可以正常启动事务

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Server 只有一个节点(测试发现多个server节点也有这个问题)

Server端配置如下

seata:
  config:
    type: file
  registry:
    type: file
  server:
    raft:
      group: jlpay
      server-addr: 10.x.x.x:9091
      snapshot-interval: 600
      apply-batch: 32
      max-append-bufferSize: 262144
      max-replicator-inflight-msgs: 256
      disruptor-buffer-size: 16384
      election-timeout-ms: 1000
      reporter-enabled: false
      reporter-initial-delay: 60
      serialization: jackson
      compressor: none
      sync: true
    service-port: 8091
    max-commit-retry-timeout: -1
    max-rollback-retry-timeout: -1
    rollback-retry-timeout-unlock-enable: false
    enable-check-auth: true
    enable-parallel-request-handle: true
    enable-parallel-handle-branch: false
    retry-dead-threshold: 130000
    xaer-nota-retry-timeout: 60000
    enableParallelRequestHandle: true
    applicationDataLimitCheck: true
    applicationDataLimit: 64000
    recovery:
      committing-retry-period: 1000
      async-committing-retry-period: 1000
      rollbacking-retry-period: 1000
      timeout-retry-period: 1000
    undo:
      log-save-days: 7
      log-delete-period: 86400000
    session:
      branch-async-queue-size: 5000 #branch async remove queue size
      enable-branch-async-remove: false #enable to asynchronous remove branchSession
  store:
    mode: file
    file:
      dir: sessionStore
      max-branch-session-size: 16384
      max-global-session-size: 512
      file-write-buffer-cache-size: 16384
      session-reload-read-size: 100
      flush-disk-mode: async
  metrics:
    enabled: false
    registry-type: compact
    exporter-list: prometheus
    exporter-prometheus-port: 9898
  transport:
    rpc-tc-request-timeout: 15000
    enable-tc-server-batch-send-response: false
    shutdown:
      wait: 3
    thread-factory:
      boss-thread-prefix: NettyBoss
      worker-thread-prefix: NettyServerNIOWorker
      boss-thread-size: 1
  security:
    secretKey: SeataSecretKey0c382ef121d778043159209298fd40bf3850a017
    tokenValidityInMilliseconds: 1800000
    csrf-ignore-urls: /metadata/v1/**
    ignore:
      urls: /,/**/*.css,/**/*.js,/**/*.html,/**/*.map,/**/*.svg,/**/*.png,/**/*.jpeg,/**/*.ico,/api/v1/auth/login,/version.json,/health,/error, /metadata/v1/**

Client传过来的 task信息如下

image

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

funky-eyes commented 5 days ago

group: jlpay -> default 因为客户端侧的按group负载均衡没做,所以默认的请求都是按default分组请求上来,导致出现npe

funky-eyes commented 4 days ago

不是上面说的那个问题,你这个堆栈不对,理论上应该是RaftCoordinator.exceptionHandleTemplate,而不是AbstractExceptionHandler.exceptionHandleTemplate 你为什么不按照raft示例里的配置来?一个节点是不能正常使用raft模式的,不要在一个节点下进行测试 store: mode: file -> raft