azkaban / azkaban

Azkaban workflow manager.
https://azkaban.github.io
Apache License 2.0
4.47k stars 1.58k forks source link

Kerberos Question #587

Open todd-fritz opened 8 years ago

todd-fritz commented 8 years ago

Does each flow need to do a kinit and build the ticket cache itself, or does Azkaban handle that? I see the code where a job buids the name of the ticket cache that is specific to a job, but don't see where it is created. If a shell script calls a utility such as hdfs via a command job type, how is the ticket known by the hadoop client?

ProcessJob.java

  private String getKrb5ccname(Props jobProps) {
    String effectiveUser = getEffectiveUser(jobProps);
    String projectName =
        jobProps.getString(CommonJobProperties.PROJECT_NAME).replace(" ", "_");
    String flowId =
        jobProps.getString(CommonJobProperties.FLOW_ID).replace(" ", "_");
    String jobId =
        jobProps.getString(CommonJobProperties.JOB_ID).replace(" ", "_");
    // execId should be an int and should not have space in it, ever
    String execId = jobProps.getString(CommonJobProperties.EXEC_ID);
    String krb5ccname =
        String.format("/tmp/krb5cc__%s__%s__%s__%s__%s", projectName, flowId,
            jobId, execId, effectiveUser);

    return krb5ccname;
  }

And putting it in the environment:

    // change krb5ccname env var so that each job execution gets its own cache
    Map<String, String> envVars = getEnvironmentVariables();
    envVars.put(KRB5CCNAME, getKrb5ccname(jobProps));

With the env variable containing the ticket cache named "KRB5CCNAME".

erwa commented 8 years ago

Hey Todd,

The job type needs to handle the Kerberos stuff. Take a look at HadoopJavaJob.java as an example. Under the hood, HadoopSecurityManager_H_2_0.java takes care of most of the Hadoop token stuff.

mikebaldinoiii commented 8 years ago

I'm running into this as well. The answer is you do need to do a kinit within each job because its appending the flow/job information instead of the uid of the username:

$ id uid=1665152075(azkaban)

This is the error:

klist: No credentials cache found (ticket cache FILE:/tmp/krb5cctest_project__test_jobtest_job31azkaban)

This is where the ticket actually is: Ticket cache: FILE:/tmp/krb5cc_1665152075