Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.86k stars 2.94k forks source link

Kerberos auth to HDFS when using journal for High Availability #16105

Open junyanguo-uber opened 2 years ago

junyanguo-uber commented 2 years ago

Alluxio Version:

2.9.0-SNAPSHOT up to this commit on 8/4/2022 https://github.com/Alluxio/alluxio/commit/fdcba75c7e65bf812c2ba07b7c968ef3c9550213

Describe the bug

When using HDFS as journal to support the HA mode, I pass in a principal and keytab via the alluxio.master.keytab.file and alluxio.master.principal config. However the principal and keytab is not used when authenticating against HDFS, causing Alluxio to fail to format the journal on HDFS, and thus Alluxio master cannot start in HA mode.

To Reproduce

Configure HDFS to require Kerberos auth. Use HDFS as journal and start in HA mode.

Expected behavior

When using HDFS as journal, Alluxio should use the provided principal and keytab to perform Kerberos auth to HDFS.

Urgency

When using HDFS as journal, and the HDFS requires Kerberos auth, Alluxio master cannot start in HA mode.

Are you planning to fix it

No

Additional context

Error messages

2022-08-25 02:43:53,852 ERROR Format - Failed to format
java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "phx2-qh5.prod.uber.internal/10.80.78.36"; destination host is: "hadoopplatinumnamenode01-phx2":8020; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:805)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
    at org.apache.hadoop.ipc.Client.call(Client.java:1453)
    at org.apache.hadoop.ipc.Client.call(Client.java:1363)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
    at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
    at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1649)
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1529)
    at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1526)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1526)
    at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1641)
    at alluxio.underfs.hdfs.HdfsUnderFileSystem.isDirectory(HdfsUnderFileSystem.java:497)
    at alluxio.underfs.UnderFileSystemWithLogging$30.call(UnderFileSystemWithLogging.java:687)
    at alluxio.underfs.UnderFileSystemWithLogging$30.call(UnderFileSystemWithLogging.java:684)
    at alluxio.underfs.UnderFileSystemWithLogging.call(UnderFileSystemWithLogging.java:1237)
    at alluxio.underfs.UnderFileSystemWithLogging.isDirectory(UnderFileSystemWithLogging.java:684)
    at alluxio.master.journal.ufs.UfsJournal.format(UfsJournal.java:449)
    at alluxio.master.journal.ufs.UfsJournalSystem.format(UfsJournalSystem.java:245)
    at alluxio.cli.Format.format(Format.java:121)
    at alluxio.cli.Format.main(Format.java:95)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:760)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:723)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:817)
    at org.apache.hadoop.ipc.Client$Connection.access$3600(Client.java:412)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1568)
    at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    ... 30 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:407)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:618)
    at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:412)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:804)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:800)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:799)
    ... 33 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
    at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:162)
    at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
    at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:189)
    at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
    ... 42 more
HelloHorizon commented 2 years ago

@yuyang733 can you take a look?