Change hadoop workflow key to bypass mapreduce workflow key

hungj commented 7 years ago

Frameworks such as Pig and Hive override the MRJobConfig.WORKFLOW_ID property, so we lose the azkaban project/flow information in the hadoop configuration.

This was tested locally. The azkaban metadata is correctly retrieved.

hungj commented 7 years ago

Thanks for the review @kunkun-tang ! Also tested hadoopJava type, it works fine. Also this feature is not supported in spark job type, so no need to test that.

hungj commented 7 years ago

@HappyRay this is related to this PR: https://github.com/azkaban/azkaban-plugins/pull/248 In that PR azkaban will inject the project/flow name information to the "mapreduce.workflow.id" configuration, so we can retrieve it on the YARN side.

We noticed a problem though where the Pig and Hive frameworks would override this property with its own Pig/Hive workflow ID name. So this PR is to change the configuration key which we inject the azkaban project/flow name to prevent the project/flow name from being overwritten.

HappyRay commented 7 years ago

Thanks @hungj. Could you add the details to the commit message when you merge?

hungj commented 7 years ago

Sure @HappyRay , will do.

azkaban / azkaban-plugins

Change hadoop workflow key to bypass mapreduce workflow key #271