Update mapreduce-pipeline example

ca-scribner commented 4 years ago

Continuation of original PR #17. Started again because a lot changed, some rebasing was needed underneath, and I made a royal mess of the commit history in the last one.

Summary: Change refactors the mapreduce-pipeline kubeflow example for clarity and to use current best practices

Changes include:

Adding to descriptions around what was being done and why
Converting map.sh/reduce.sh to python scripts (more users are likely fluent in python?) and renaming to sample.py/average.py to distinguish components used in the map-reduce pipeline from the general idea of the map-reduce pattern
Removing older references on how to use minio (they pass credentials via environment variables, then stored in a now derelict minio repo).

Will squash once everything looks good.

ca-scribner commented 4 years ago

@blairdrummond I think it reads clearer now, both in notebook and in containers. Let me know what you think.

The only thing I think I resisted on was removing utilities.py. It seemed cleaner in the notebook to have a self-documenting function call instead of a lambda and comment. The burden of a dependency feels minor to me, but if you feel strongly the other way let me know.

ca-scribner commented 4 years ago

Oh, and one other thing is I took the seeds generator out in favor of seeds defined by a range in the pipeline. Only reason was I wondered if everyone would understand what the generator was doing? If you prefer it in there though I can restore it

StatCan / aaw-contrib-jupyter-notebooks

Update mapreduce-pipeline example #21