StatCan / aaw-contrib-jupyter-notebooks

Jupyter Notebooks to be used with Advanced Analytics Workspace platform
Other
10 stars 13 forks source link

Update mapreduce-pipeline example #21

Closed ca-scribner closed 4 years ago

ca-scribner commented 4 years ago

Continuation of original PR #17. Started again because a lot changed, some rebasing was needed underneath, and I made a royal mess of the commit history in the last one.

Summary: Change refactors the mapreduce-pipeline kubeflow example for clarity and to use current best practices

Changes include:

Will squash once everything looks good.

ca-scribner commented 4 years ago

@blairdrummond I think it reads clearer now, both in notebook and in containers. Let me know what you think.

The only thing I think I resisted on was removing utilities.py. It seemed cleaner in the notebook to have a self-documenting function call instead of a lambda and comment. The burden of a dependency feels minor to me, but if you feel strongly the other way let me know.

ca-scribner commented 4 years ago

Oh, and one other thing is I took the seeds generator out in favor of seeds defined by a range in the pipeline. Only reason was I wondered if everyone would understand what the generator was doing? If you prefer it in there though I can restore it