VeritoneAlpha / spark-job-rest

Other
33 stars 20 forks source link

Memory Management #7

Open raduchilom opened 9 years ago

raduchilom commented 9 years ago

Because we are spinning a new JVM with custom amount of memory for each context, we should limit the amount of memory that can be used for all the JVM processes. If the specified memory is full we should not create other contexts until the amount of memory necessary for the new processes is freed.

petro-rudenko commented 9 years ago

Is it possible to use somehow yarn-cluster mode, so the new driver process would be managed by YARN resource manager inside a cluster?

raduchilom commented 9 years ago

It is possible to use that, but the problem still remains, because it spins a new process on the server machine in order to launch the context, and on this machine we want to have some memory management for all the created processes.

petro-rudenko commented 9 years ago

There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

In yarn-cluster mode the resource management of the driver process lies on YARN. It can launch it on any node in the cluster that has enough resources for it.