apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
1.04k stars 326 forks source link

[Bug report] HMS HA Feature unsupported #5137

Open xxzhky opened 2 weeks ago

xxzhky commented 2 weeks ago

Version

0.6.0

Describe what's wrong

Gravitino catalog config

Properties
Key
Value
gravitino.bypass.hive.metastore.client.capability.check FALSE
metastore.uris thrift://ecs-dev-66-133-flink.msxf.host:9089,thrift://ecs-dev-66-100-flink.msxf.host:9089
kerberos.principal hive/ecs-dev-64-179-flink.msxf.host@DPOPSTEST.HADOOP
gravitino.bypass.hive.metastore.kerberos.principal hive/ecs-dev-66-133-flink.msxf.host@DPOPSTEST.HADOOP
kerberos.keytab-uri file:///home/xdt/gravikey/hms4/hive.keytab
gravitino.bypass.hive.metastore.sasl.enabled TRUE
gravitino.bypass.hadoop.security.authentication kerberos

Reason

When configuring the Iceberg catalog, attempting to use multiple Hive Metastore instances to meet the production environment's high-availability (HA) requirements results in the following error:
transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed
Note:
For the relevant catalog configuration information, please refer to the table above. Specifically:

Error message and/or stacktrace

2024-10-12T10:52:10,393 ERROR [Metastore-Handler-Pool: Thread-63] transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199) ~[?:1.8.0_232] at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:507) ~[hive-exec-4.0.0.jar:4.0.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:250) ~[hive-exec-4.0.0.jar:4.0.0] at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:44) ~[hive-exec-4.0.0.jar:4.0.0] at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:199) ~[hive-exec-4.0.0.jar:4.0.0] at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:711) ~[hive-exec-4.0.0.jar:4.0.0] at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:707) ~[hive-exec-4.0.0.jar:4.0.0] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_232] at javax.security.auth.Subject.doAs(Subject.java:360) ~[?:1.8.0_232] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1855) ~[hadoop-common-3.3.4.jar:?] at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:707) ~[hive-exec-4.0.0.jar:4.0.0] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:227) ~[hive-exec-4.0.0.jar:4.0.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_232] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_232] Caused by: org.ietf.jgss.GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:858) ~[?:1.8.0_232] at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) ~[?:1.8.0_232] at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) ~[?:1.8.0_232] at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167) ~[?:1.8.0_232] ... 14 more Caused by: sun.security.krb5.KrbCryptoException: Checksum failed at sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType.decrypt(Aes128CtsHmacSha1EType.java:102) ~[?:1.8.0_232] at sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType.decrypt(Aes128CtsHmacSha1EType.java:94) ~[?:1.8.0_232] at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175) ~[?:1.8.0_232] at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281) ~[?:1.8.0_232] at sun.security.krb5.KrbApReq.(KrbApReq.java:149) ~[?:1.8.0_232] at sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108) ~[?:1.8.0_232] at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:831) ~[?:1.8.0_232] at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) ~[?:1.8.0_232] at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) ~[?:1.8.0_232] at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167) ~[?:1.8.0_232] ... 14 more Caused by: java.security.GeneralSecurityException: Checksum failed at sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451) ~[?:1.8.0_232] at sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272) ~[?:1.8.0_232] at sun.security.krb5.internal.crypto.Aes128.decrypt(Aes128.java:76) ~[?:1.8.0_232] at sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType.decrypt(Aes128CtsHmacSha1EType.java:100) ~[?:1.8.0_232] at sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType.decrypt(Aes128CtsHmacSha1EType.java:94) ~[?:1.8.0_232] at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175) ~[?:1.8.0_232] at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281) ~[?:1.8.0_232] at sun.security.krb5.KrbApReq.(KrbApReq.java:149) ~[?:1.8.0_232] at sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108) ~[?:1.8.0_232] at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:831) ~[?:1.8.0_232] at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) ~[?:1.8.0_232] at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) ~[?:1.8.0_232] at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167) ~[?:1.8.0_232] ... 14 more

How to reproduce

Additional context

No response

tyoushinya commented 1 week ago

@xxzhky I tested this case seems every thing works well in my side. You can try change gravitino.bypass.hive.metastore.kerberos.principal to hive/_HOST@DPOPSTEST.HADOOP BTW, hadoop common can replace the _HOST with actual hostname.