Open daddydrac opened 3 years ago
Hi @joehoeller , thx for the question. Basically there are 2 Livy job submission modes: Batch and Interactive. For Batch mode the communication flow is the following:
For interactive mode:
Note: both modes works in basically the same way on Yarn, the only replacement here is that we use Kubernetes as the resource manager instead.
Does that answer your question? Please let me know if you would like to know more about any specific part of these flows. Best.
Wow, that’s a really complete answer, thank you for that.
So, that leads me to my next 2 Ques:
How can I expose the necessary nodePorts to expose the UI?
Is it possible to communicate RESTfully and send jobs via notebook?
On Thu, Dec 17, 2020 at 12:39 AM Aliaksandr Sasnouskikh < notifications@github.com> wrote:
Hi @joehoeller https://github.com/joehoeller , thx for the question. Basically there are 2 Livy job submission modes: Batch and Interactive. For Batch mode the communication flow is the following:
- Livy talks to Kubernetes API to request Spark Driver
- Spark Driver talks to Kubernetes API to request Spark Executors and resolves them once created to communicate tasks and track the status/progress
- Livy resolves Spark Driver and Executrors via Kubernetes API and tracks their statuses So no direct interaction of Livy and Spark in this mode.
For interactive mode:
- Livy talks to Kubernetes API to request Spark Driver with the specific entrypoint JAR (for the interactive mode) and spins up the RPC server asynchronously waiting for the client registration request
- Spark Driver talks to Kubernetes to request the executors and calls Livy RPC server to register and share its own RPC server endpoint
- Livy communicates with Spark Driver via their RPC servers
Note: both modes works in basically the same way on Yarn, the only replacement here is that we use Kubernetes as the resource manager instead.
Does that answer your question? Please let me know if you would like to know more about any specific part of these flows. Best.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JahstreetOrg/spark-on-kubernetes-helm/issues/53#issuecomment-747243593, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHVQHC4MAMYWH7RDRR23ULSVGRRVANCNFSM4U6VQZMQ .
Awesome, happy to help.
To answer the rest 2 questions I would suggest you to give a try to this step-by-step installation guide on Minikube. This will show you how to spin up the required components and have JupyterHub with Jupyter notebooks per user exposed externally from the Kubernetes cluster as well as direct access to Livy UI with the links to Spark UI.
Also some design details can be found here.
In short: to setup the Jupyter -> Livy -> Spark communication Sparkmagic is used. To expose the component endpoints Nginx Controller backed Ingress
es are used.
Hi @salinaaaaaa , is that issue fixed for you?
Hi @jahstreet , so if I understand correctly, in interactive mode (the mode you would probably use w/ sparkmagic & jupyter notebook), Livy would:
Hi @Almenon , you're almost correct. If we speak about interactive mode:
1 ask Kubernetes to create a driver POD
and establish web-socket connection between Livy and Spark driver rpc server
...
5 When interactive session completes Livy deletes the driver pod, which triggers deletion of executor pods referenced to driver pod
Thanks! Sorry for the basic question, but what does it mean when you say the "interactive session completes"? in this example, would the session complete when the function get
finishes?
double pi = client.submit(new PiJob(samples)).get();
For context I'm a DevOps engineer, not a spark developer, so this stuff is new to me 😅
Does the helm chart for Livy deploy Spark? If not, how do we configure Spark helm chart and Livy helm chart so they can "talk" to each other/submit jobs via REST services?