How to use multiple scripts or directories in a transform step?

First of all thank you for making this project, it has made AWS DataPipeline useable.

My question is how do you go about passing multiple scripts, or multiple directories into the a pipeline's YAML file? The reason I'm asking is because I want to consolidate common functionality without having to pass the entire directory for every job to every datapipeline.

For example we currently have a project structure that looks something like this:

Jobs
- Job1
- - job1.py
- - job1.yaml
- - duplicated_utility.py
- Job2
- - job2.py
- - job2.yaml
- - duplicated_utility.py
- Job3
- - job3.py
- - job3.yaml
- - duplicated_utility.py
...

What I want to do is to consolidate the duplicate utility.py files into one file or collection of files in a lib directory. So what I want it to look like would be:

Jobs
- Job1
- - job1.py
- - job1.yaml
- Job2
- - job2.py
- - job2.yaml
- Job3
- - job3.py
- - job3.yaml
lib
- utliity.py
...

The problem with this is that you would have to pass in the directory and point to a specific script name for each job, thus meaning you are moving a lot of extra files around for no reason. You could also create symlinks within each job to the lib directory but that requires a lot of overhead and just isn't ideal.

Is there some functionality I'm not aware of, or a best practice to be used?

Thanks!

Eric

coursera / dataduct

How to use multiple scripts or directories in a transform step? #262