Cascading / cascading.hbase

HBase adapters for Cascading
http://www.cascading.org/
10 stars 11 forks source link

HBase Security #2

Closed branky closed 10 years ago

branky commented 11 years ago

Within a secured Hadoop cluster (Kerberos authentication enabled), just like delegation tokens are required to access HDFS data and run MapReduce jobs, a token is also need to interact with HBase.

Current implementation of HBaseTap doesn't handle token, read and write data from/to will get exceptions:

Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)     
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:130) 
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) 
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
 ... 55 more
2013-11-06 22:28:12,807 WARN cascading.tap.hadoop.io.MultiInputFormat: unable to get record reader, but not retryingjava.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details.
at cascading.hbase.helper.TableInputFormatBase.getRecordReader(TableInputFormatBase.java:93)
at cascading.tap.hadoop.io.MultiInputFormat$1.operate(MultiInputFormat.java:252)
at cascading.tap.hadoop.io.MultiInputFormat$1.operate(MultiInputFormat.java:247
at cascading.util.Util.retry(Util.java:730)
at cascading.tap.hadoop.io.MultiInputFormat.getRecordReader(MultiInputFormat.java:246)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:190)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:411)
org.apache.hadoop.hbase.security.AccessDeniedException: 
at cascading.hbase.HBaseTap.sinkConfInit(HBaseTap.java:53)        
at cascading.hbase.HBaseTapCollector.initialize(HBaseTapCollector.java:83)        
at cascading.hbase.HBaseTapCollector.prepare(HBaseTapCollector.java:74)        
at cascading.hbase.HBaseTap.openForWrite(HBaseTap.java:147)        
at cascading.hbase.HBaseTap.openForWrite(HBaseTap.java:53)        
at cascading.flow.stream.SinkStage.prepare(SinkStage.java:60)        
at cascading.flow.stream.StreamGraph.prepare(StreamGraph.java:167)        
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:110)        
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)        
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)        
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)        
at java.security.AccessController.doPrivileged(Native Method)        
at javax.security.auth.Subject.doAs(Subject.java:396)        
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)        
at org.apache.hadoop.mapred.Child.main(Child.java:249)
branky commented 11 years ago

Token (HBASE_AUTH_TOKEN) has to be obtained during job submission:

// These 2 line of code need in Cascading client application
UserGroupInformation user = UserGroupInformation.getCurrentUser();
TokenUtil.obtainAndCacheToken(conf, user);

Due to unkown issue, this token is not in token list of Job Configuration’s credentials. Luckily, it can be retrieved from current user’s UserGroupInformation.

Credentials credentials = conf.getCredentials(); 
 for (Token t : UserGroupInformation.getCurrentUser().getTokens())  {
         if (t.getKind == HBASE_AUTH_TOKEN)   
                  credentials.addToken(new Text(“HBASE_AUTH_TOKEN"), t);
  }
fs111 commented 11 years ago

Thanks for the patch! Could you send me the signed copy of the CCA, which I sent you yesterday? Thanks!

fs111 commented 11 years ago

I am not sure, if I understand your comment above: If I merge this in, does the HBaseTap handle everything by iteself or does a user have to do some extra work before? If the latter is the case, we have to 1) document that and 2) I have to make sure, it also works with the lingual provider. Can you elaborate on that please?

branky commented 11 years ago

Yes, as I mentioned, "Token (HBASE_AUTH_TOKEN) has to be obtained during job submission", below has to be added in user's code.

if (User.isHBaseSecurityEnabled(conf)) {
     TokenUtil.obtainAndCacheToken(conf, UserGroupInformation.getCurrentUser());
}
fs111 commented 11 years ago

I see. We really need to add a test with a secure cluster. Do you know if the test cluster can be made to act like one?

branky commented 11 years ago

I have been using this patch in our internal cluster. Kerberos authentication really introduces complexity. HBase's own security tests may be helpful for reference. https://github.com/apache/hbase/tree/0.94/security/src/test/java/org/apache/hadoop/hbase

fs111 commented 10 years ago

Could you add a little paragraph to the README explaining this usage, so that it does not get buried in this pull request. Thanks!