YahooArchive / oozie

Oozie - workflow engine for Hadoop
http://yahoo.github.com/oozie/
Apache License 2.0
373 stars 160 forks source link

add support for multiple workflow XMLs in a single HDFS directory #26

Closed tucu00 closed 13 years ago

tucu00 commented 14 years ago

Currently a workflow XML is the 'workflow.xml' file under the HDFS directory specified in the job property 'oozie.wf.application.path'.

This means that a given HDFS directory can have only one workflow app (the workflow.xml file).

In many cases is desirable to share configurations and binaries among multiple workflow apps.

Today this is not possible.

Proposal:

1* If 'oozie.wf.application.path' points to a HDFS directory, the workflow app is 'workflow.xml' (today's behavior) 2* If 'oozie.wf.application.path' points to an XML file in HDFS, the workflow app is the specified file path and the workflow app directory (for all resources and and binaries) is the parent directory.

This proposal preserves backwards compatibility.

brookwc commented 13 years ago

A question is that how do you differentiate if path points to a file or a directory.

For example, I can have the following:

path = /a/b/workflow.xml

This path is ambiguous, it could be pointing to a workflow.xml under /a/b/ or /a/b/workflow.xml/.

I checked /a/b/workflow.xml/ is a valid hdfs path.

tucu00 commented 13 years ago

My idea is that the disambiguation rule is:

1* if path is DIR, then look for workflow in workflow.xml and the parent is the app root 2* if path is a file, then the file is the workflow and the parent is the app root

brookwc commented 13 years ago

This looks good to me.

I can work on this small task.

Thanks.

tucu00 commented 13 years ago

brookwc,

Great, a couple of things:

1* Unless you are a Yahoo employee, you'll need to have a CLA submitted to Yahoo in order for the patch to be accepted.

2* Do you have an estimate for the patch?

brookwc commented 13 years ago
  1. Yes, I am Yahoo employee.
  2. Patch will be ready for review in a day.
brookwc commented 13 years ago

Patch is here. Please review.

http://github.com/brookwc/oozie/commit/df33a61f610105b83564fb5cbb046695c882278c

brookwc commented 13 years ago

This is a new patch according to tucu00's review feedback. thanks tucu00!

http://github.com/brookwc/oozie/commit/32536b4a873ea27440fd8f97d9254fad85b610a9

Still some test cases (around 30-40) need to be cleaned up accordingly (should be mostly mechanical).

tucu00 commented 13 years ago

this should work also for coordinator apps

brookwc commented 13 years ago

Here is the latest patch that can work for both workflow and coordinator apps. Test cases are cleaned up along the way.

http://github.com/brookwc/oozie/commit/9d162fc4aa44190bb9f5554aa10dfb3a219c1a06

It has some significant change from last time, please review. Will create a pull request after review passes here.

bansalmayank commented 13 years ago

Closed by fbab0ab06f64467d6f709590ece77adea3c76844 add support for multiple workflow XMLs in a single HDFS

mikelikespie commented 13 years ago

The documentation should be updated to reflect this change.