apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.86k stars 1.77k forks source link

[Bug] [seatunnel-engine-storage] map and checkpoint Writing HDFS Kerberos tickets with automatic 24-hour expiration #7102

Open weipengfei-sj opened 3 months ago

weipengfei-sj commented 3 months ago

Search before asking

What happened

采用seatunnel2.3.5版本,3个节点的集群模式 hazelcast.yaml 配置如下: map: engine*: map-store: enabled: true initial-mode: EAGER factory-class-name: org.apache.seatunnel.engine.server.persistence.FileMapStoreFactory properties: type: hdfs namespace: /tmp/seatunnel/imap clusterName: seatunnel-cluster storage.type: hdfs fs.defaultFS: hdfs://fss:8020 kerberosPrincipal: hdfs kerberosKeytabFilePath: /applinkis/ceph/share/hadoopcluster/fss/keytab/hdfs.keytab krb5Path: /app/linkis/seatunnel/config/krb5.conf seatunnel.hadoop.dfs.nameservices: fss seatunnel.hadoop.dfs.ha.namenodes.fss: nn1,nn2 seatunnel.hadoop.dfs.namenode.rpc-address.fss.nn1: nn1:8020 seatunnel.hadoop.dfs.namenode.rpc-address.fss.nn2: nn2:8020 seatunnel.hadoop.dfs.client.failover.proxy.provider.usdp-bing: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider seatunnel.hadoop.dfs.namenode.kerberos.principal: nn/_HOST@T1.COM seatunnel.hadoop.dfs.datanode.kerberos.principal: dn/_HOST@T1.COM seatunnel.hadoop.rpc.protection: authentication seatunnel.hadoop.security.authentication: kerberos hdfs_site_path: /applinkis/ceph/share/hadoopcluster/fss/hadoop/hdfs-site.xml 配置map信息写入到hdfs上,当集群运行超过24h之后,观察服务日志,发现写hdfs存在kerberos票据过期问题

分析源码如下: 如果采用该方式认证hdfs写入hdfs,不自动刷新票据的逻辑话,必然存在票据过期的问题出现 image

尝试修改代码,增加认证后,启动定时任务自动刷新机制: image image

但是增加上述自动刷新kerberos票据机制之后,24h后,服务写hdfs仍然报存在票据不可用的问题 另外尝试了多个地方,比如在HdfsWriter类中也增加了票据自动刷新机制,但是均不生效,请社区的大佬帮忙指正一下,非常感谢

SeaTunnel Version

2.3.5

SeaTunnel Config

hazelcast.yaml 配置如下:
  map:
    engine*:
       map-store:
         enabled: true
         initial-mode: EAGER
         factory-class-name: org.apache.seatunnel.engine.server.persistence.FileMapStoreFactory
         properties:
           type: hdfs
           namespace: /tmp/seatunnel/imap
           clusterName: seatunnel-cluster
           storage.type: hdfs
           fs.defaultFS: hdfs://fss:8020
           kerberosPrincipal: hdfs
           kerberosKeytabFilePath: /applinkis/ceph/share/hadoopcluster/fss/keytab/hdfs.keytab 
           krb5Path: /app/linkis/seatunnel/config/krb5.conf
           seatunnel.hadoop.dfs.nameservices: fss
           seatunnel.hadoop.dfs.ha.namenodes.fss: nn1,nn2
           seatunnel.hadoop.dfs.namenode.rpc-address.fss.nn1: nn1:8020
           seatunnel.hadoop.dfs.namenode.rpc-address.fss.nn2: nn2:8020
           seatunnel.hadoop.dfs.client.failover.proxy.provider.usdp-bing: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
           seatunnel.hadoop.dfs.namenode.kerberos.principal: nn/_HOST@T1.COM
           seatunnel.hadoop.dfs.datanode.kerberos.principal: dn/_HOST@T1.COM
           seatunnel.hadoop.rpc.protection: authentication
           seatunnel.hadoop.security.authentication: kerberos
           hdfs_site_path: /applinkis/ceph/share/hadoopcluster/fss/hadoop/hdfs-site.xml

Running Command

./bin/seatunnel.sh  -c config/test-source-kerberos-kafka.yaml

Error Exception

2024-07-03 15:12:50,607 WARN  [o.a.h.i.Client                ] [LeaseRenewer:hdfs@fsst1] - Exception encountered while connecting to the server
javax.security.sasl.SaslException: GSS initiate failed
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) ~[?:1.8.0_181]
        at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_181]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
/seatunnel-starter.jar
        at com.sun.proxy.$Proxy34.fsync(Unknown Source) ~[?:?]
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.fsync(ClientNamenodeProtocolTranslatorPB.java:984) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at sun.reflect.GeneratedMethodAccessor107.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at com.sun.proxy.$Proxy35.fsync(Unknown Source) ~[?:?]
        at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:706) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.hdfs.DFSOutputStream.hsync(DFSOutputStream.java:604) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.hdfs.client.HdfsDataOutputStream.hsync(HdfsDataOutputStream.java:96) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.wal.writer.HdfsWriter.flush(HdfsWriter.java:87) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.wal.writer.HdfsWriter.write(HdfsWriter.java:101) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.wal.writer.HdfsWriter.write(HdfsWriter.java:80) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.wal.writer.HdfsWriter.write(HdfsWriter.java:44) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.common.WALWriter.write(WALWriter.java:50) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.disruptor.WALWorkHandler.walEvent(WALWorkHandler.java:87) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.disruptor.WALWorkHandler.onEvent(WALWorkHandler.java:78) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.disruptor.WALWorkHandler.onEvent(WALWorkHandler.java:44) ~[seatunnel-starter.jar:2.3.5]
        at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) ~[seatunnel-starter.jar:2.3.5]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) ~[?:1.8.0_181]
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) ~[?:1.8.0_181]
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) ~[?:1.8.0_181]
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) ~[?:1.8.0_181]
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) ~[?:1.8.0_181]
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) ~[?:1.8.0_181]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ~[?:1.8.0_181]
        ... 39 more

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

liunaijie commented 3 months ago

hi, base your issue title and description, the exception is happend on write IMAP or checkpoint right, not on the data sync process, right?

i guess is the FileSystem client not refresh, so even you refresh the config, the client still use old config then got issue. you can try to refresh the client to slove this issue. image

weipengfei-sj commented 2 months ago

image 我尝试在客户端增加了定时刷新,24h后,仍然是报kerberos认证的错误

weipengfei-sj commented 2 months ago

@liunaijie 请帮忙看看呢

liunaijie commented 2 months ago

@liunaijie 请帮忙看看呢

hi, please attach all the code you change, or give the repo link.