TheDataRideAlongs / ProjectDomino

Scaling COVID public behavior change and anti-misinformation
Apache License 2.0
61 stars 13 forks source link

GKE for Prefect's Dask Executor and general Dask #75

Open bmorphism opened 3 years ago

bmorphism commented 3 years ago

Please reference notebook here https://sandbox.projectdomino.org/notebook/notebooks/notebooks/twitter/twint/Cody-twint-prefect-defcon-Copy1.ipynb#dask for example of a workload to be executed with Dask.

Goal

Run large jobs using GCP resources optimally in a platform-agnostic way.

Current use-cases

Dask Executor for Prefect configured with secure access.

Dask can be used by data scientists for the datasets larger than is feasible to analyze on a single machine. It presents a dask.DaskDataframe object that can act as a [drop-in replacement] for the familiar pandas.PandasDataframe.(https://docs.dask.org/en/latest/dataframe.html).

Acceptance

Helm chart that deploys https://docs.dask.org/en/latest/setup/kubernetes-helm.html

Chat can be deployed using GitOps with a derivative of https://github.com/WyriHaximus/github-action-helm3

@lmeyerov @webcoderz @bechbd