Open jennakwon06 opened 4 years ago
Or something like - set up the Dask cluster as part of EMR bootstrap - that would be useful.
This seems like a useful feature though I'm not sure it belongs in dask-yarn. Quickly glancing at boto
and it seems like there is support for launching EMR. In fact, I found a blog post on it: https://medium.com/@kulasangar/create-an-emr-cluster-and-submit-a-job-using-boto3-c34134ef68a0. Perhaps someone has time to experiment with connecting boto3 and dask-yarn together ?
So yes - we are programatically launching an EMR cluster with boto EMR api.
But the manual step is - when EMR cluster is done launching (takes ~5 minutes), log onto the master node of the EMR cluster then run a Jupyter notebook with cell "cluster = YarnCluster(...)".
We then do "Client("ip-node-of-emr-master-node")" to connect to the YarnCluster from somewhere different than EMR master node - like a Jupyter notebook on a SageMaker notebook instance.
So the ideal is - from my SM notebook instance, I can do one call "spin-up-dask-cluster-on-emr-cluster(dask_cluster_settings, emr_cluster_settings)".
Hello,
We want to programmatically spin up an EMR cluster then spin up a Dask cluster in the EMR cluster with YarnCluster construct.
Currently, what we are doing is - open up SSH tunnel to the master node of the EMR cluster (it's in private subnet), log onto the master node, create a .ipynb notebook that has "YarnCluster(..)" code. We execute that cell to spin up the Dask cluster.
It would be nice to automate this; e.g. run some commands to spin up an EMR cluster that also has Dask cluster.
Thanks!