cloudera-labs / envelope

Build configuration-driven ETL pipelines on Apache Spark
Apache License 2.0
157 stars 89 forks source link

Can envelope run nested loops ? #35

Open sgulati89 opened 5 years ago

sgulati89 commented 5 years ago

Hi, Can envelope run nested loops ?

For Example how to implement following logic in envelope,

for (String str:values) { for(String str1:values2){ somestep(str1,str) { }

jeremybeard commented 5 years ago

Hi @sgulati89,

Yes, in general Envelope can run nested loops. In the current version (v0.6.1) this is known to work when the values are provided by a range or by a step. In the next minor version (v0.7.0) this will also work when the values are provided by a list.

Here's an example of using a range for the outer loop and a step for the inner loop:

> cat loop.conf
steps {
  step_source {
    deriver {
      type = sql
      query.literal = "SELECT 1 UNION ALL SELECT 2"
    }
  }

  outer_loop {
    type = loop
    source = range
    range {
      start = 10
      end = 12
    }
    mode = serial
    parameter = outer
  }

  inner_loop {
    dependencies = [outer_loop, step_source]
    type = loop
    source = step
    step = step_source
    mode = serial
    parameter = inner
  }

  print_parameters {
    dependencies = [outer_loop, inner_loop]
    deriver {
      type = sql
      query.literal = "SELECT ${outer} AS outer, ${inner} AS inner"
    }
    print.data.enabled = true
  }
}

> spark-submit envelope-0.6.1.jar loop.conf
...
+-----+-----+
|outer|inner|
+-----+-----+
|   10|    1|
+-----+-----+
...
+-----+-----+
|outer|inner|
+-----+-----+
|   10|    2|
+-----+-----+
...
+-----+-----+
|outer|inner|
+-----+-----+
|   11|    1|
+-----+-----+
...
+-----+-----+
|outer|inner|
+-----+-----+
|   11|    2|
+-----+-----+
...
+-----+-----+
|outer|inner|
+-----+-----+
|   12|    1|
+-----+-----+
...
+-----+-----+
|outer|inner|
+-----+-----+
|   12|    2|
+-----+-----+

Hope that helps.