Closed cliu587 closed 8 years ago
LGTM
Yeah I agree that we can wait on mixed workflows until a bit later once we have ironed out more parts of this.
My worry with people using worker groups in EMR is that intermediate results written to HDFS must be cleaned correctly on failure as this happens really well for the staging directories for but for intermediate steps it is just something people need to be more watchful of.
Add support for worker groups, specified via configs.
Currently we do not allow a mixed workflow where some steps are ran via worker groups, while others are ran via Datapipeline instance/cluster management. This can be added in if required.