Samsung / qaboard

Experiment tracker: organize, visualize, compare and share runs. Removes toil from algorithm/performance R&D and tuning.
https://samsung.github.io/qaboard
Apache License 2.0
54 stars 14 forks source link

Pipelines / DAG #10

Open arthur-flam opened 4 years ago

arthur-flam commented 4 years ago

Currently QA-Board lacks expressiveness for our common use-case of:

  1. Run on some images
  2. Calibration
  3. Validation Likewise, we can't express easily pipelines like training-evaluation.

We need to express running series of steps / pipelines / tasks organized as directed-acyclic-graph.

We're looking for feedback or alternative ideas. Especially if you have experience with various flow engines, e.g. DVC. Thanks!

Workarounds

User have done this:

Status

Possible API

batch1:
  inputs:
  - A.jpg
  - B.jpg
  configurations:
  - base

batch2:
  needs: batch1
  type: script
  configurations:
  - python my_script.py {o.output_dir for o in needs["batch1"]}

More complex:

my-calibration-images:
    configurations:
    - base
    inputs:
    - DL50.raw
    - DL55.raw
    - DL65.raw
    - DL75.raw

my-calibration:
    needs:
      calibration_images: my-calibration-images
    type: script
    configurations:
    - python calibration.py ${o.output_directory for o in depends[calibration_images]}

my-evaluation-batch:
    needs:
      calibration: my-calibration
    inputs:
    - test_image_1.raw
    - test_image_2.raw
    - test_image_3.raw
    configurations:
    - base
    - ${depends[calibration].output_directory}/calibration.cde
$ qa batch my-evaluation-batch
#=> qa batch my-calibration-images
#=> qa batch my-calibration
#=> qa batch my-evaluation-batch

Thoughts

Expected

arthur-flam commented 4 years ago

Update: thanks to Itamar Persi and Ela Shahar, there is a pipeline implementation in "user-land":

my-pipeline:
  configs:
  - run: echo "Step 1"
  - batch: first-batch
  - batch:
    - second-batch
    - third-batch
    - label: batches running in parallel
  - run: some-postprocessing-script.py

Features include

It's much simpler than a full DAG, and good enough in most cases.

Next steps