gearpump / gearpump

Lightweight real-time big data streaming engine over Akka
https://gearpump.github.io/gearpump/
Apache License 2.0
763 stars 152 forks source link

Integrate Gear scheduler with YARN #27

Closed kkasravi closed 9 years ago

kkasravi commented 10 years ago

YARN resource manager and Gearpump Master need to integrate. It may be that Gearpump Master provides a virtual resource allocation which is sync'd with actual resources allocated by YARN RM. This would allow us to include attributes not yet in YARN like SLA's.

Requirements/Features for YARN Client and Appmaster (WIP)

clockfly commented 10 years ago

Let's do some investigation.

We need to invesitage other appmaster generalization solution like helix, slider, LLAMA are doing, to make sure we are doing the right thing in the right way.

clockfly commented 10 years ago

gear scheduler may work under YARN, or yarn works under gear scheduler.

There will be some message passing bewteen gear scheduler with yarn, so that they have consistent view of resource.

clockfly commented 10 years ago

Maybe we need to deplay this to milestone 0.2

clockfly commented 9 years ago

Currently, we can implement a fast version, by justing starting master and workers in YARN. We can take care of yarn allocatation delegation later.

kkasravi commented 9 years ago

Assigning to Tomek. TIanlun - Tomek is new member and has YARN background.

devaraj-kavali commented 9 years ago

Kam, Is it expected here to start Master and workers for each GearPump Application(i.e. one yarn application for each gearpump application) or launching GearPump master and workers one time as a long running service and handles all the gearpump applications(i.e. one yarn application for all gearpump applications).

I have just started with Gearpump, I could probably help this part once I get familiarize with the code.

kkasravi commented 9 years ago

Hi Devaraj

It would be the latter (long running service). For now the appmaster would ask for a static set of containers and run a GearPump worker on each one. Possibly later we could make this elastic. I haven't looked at Slider for possibly doing this - any opinions? Also Tomek and I thought having the client, appmaster written in scala would be a good way for people to get used to scala that are working on this issue. Let me know if you have any suggestions on bootstrapping the appmaster, client. I was thinking of just converting distributedshell to scala or starting from this one. Let me know any recommendations you may have. Thanks!

devaraj-kavali commented 9 years ago

It is good to start with the distributedshell master and client. Your approach is perfectly correct. Slider also does in the similar way, in addition to these Slider starts the component daemons as long running services. And also slider has elastic feature, in which number of workers/slaves can be increased/decreased by requesting new containers from RM or releasing containers. We can also add these features as a step by step once we have the basic yarn application for gearpump.

kkasravi commented 9 years ago

Thanks. We'll be using CDH 5.3.

clockfly commented 9 years ago

@devaraj-kavali

Currently, we will the schedule the cluster dameons like master and worker with YARN.

kkasravi commented 9 years ago

Let's merge to a branch under master called yarn.

kkasravi commented 9 years ago

Need to create /user/gearpump/jars in HDFS and chmod it to 777 prior to running the yarn client.

kkasravi commented 9 years ago

This is a good article describing Slider's capabilities - http://hortonworks.com/blog/deploying-long-running-services-on-apache-hadoop-yarn-cluster/. Features include:

According to the article - the steps are straightforward:

I will explore this approach and provide some details.

kkasravi commented 9 years ago

YarnAM is now launching successfully - however its not launching the master and workers correctly yet - working on it. Will push to yarn in the next few moments. Log output below:

15/03/07 17:08:47 INFO YarnAM$: Creating YarnAMActor 15/03/07 17:08:47 INFO YarnAMActor: Creating NMCallbackHandler 15/03/07 17:08:47 INFO YarnAMActor: Creating NMClientAsync 15/03/07 17:08:47 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500 15/03/07 17:08:47 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 15/03/07 17:08:47 INFO AMRMClientAsyncActor: preStart 15/03/07 17:08:47 INFO RMCallbackHandlerActor: Sending RMCallbackHandler to YarnAM 15/03/07 17:08:47 INFO YarnAMActor: Received RMCallbackHandler 15/03/07 17:08:47 INFO AMRMClientAsyncActor: Received RMCallbackHandler 15/03/07 17:08:47 INFO AMRMClientAsyncActor: starting AMRMClientAsync 15/03/07 17:08:47 INFO client.RMProxy: Connecting to ResourceManager at ip-10-10-10-217.eu-west-1.compute.internal/10.10.10.217:8030 15/03/07 17:08:47 INFO AMRMClientAsyncActor: Received RegisterAMMessage 0 15/03/07 17:08:48 INFO YarnAMActor: Received RegisterApplicationMasterResponse 15/03/07 17:08:48 INFO YarnAMActor: Previous container count : 0 15/03/07 17:08:48 INFO AMRMClientAsyncActor: Received ContainerRequestMessage 15/03/07 17:08:48 INFO AMRMClientAsyncActor: creating ContainerRequest 15/03/07 17:08:48 INFO AMRMClientAsyncActor: Received ContainerRequestMessage 15/03/07 17:08:48 INFO AMRMClientAsyncActor: creating ContainerRequest 15/03/07 17:08:48 INFO AMRMClientAsyncActor: Received ContainerRequestMessage 15/03/07 17:08:48 INFO AMRMClientAsyncActor: creating ContainerRequest 15/03/07 17:08:48 INFO AMRMClientAsyncActor: Received ContainerRequestMessage 15/03/07 17:08:48 INFO AMRMClientAsyncActor: creating ContainerRequest 15/03/07 17:08:50 INFO impl.AMRMClientImpl: Received new token for : ip-10-10-10-98.eu-west-1.compute.internal:8041 15/03/07 17:08:50 INFO impl.AMRMClientImpl: Received new token for : ip-10-10-10-99.eu-west-1.compute.internal:8041 15/03/07 17:08:50 INFO impl.AMRMClientImpl: Received new token for : ip-10-10-10-95.eu-west-1.compute.internal:8041 15/03/07 17:08:50 INFO RMCallbackHandler: Got response from RM for container request, allocatedCnt=3 15/03/07 17:08:50 INFO YarnAMActor: Received LaunchContainers 15/03/07 17:08:50 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1424433163267_0065_01_000003 15/03/07 17:08:50 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1424433163267_0065_01_000004 15/03/07 17:08:50 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1424433163267_0065_01_000002 15/03/07 17:08:50 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-10-10-99.eu-west-1.compute.internal:8041 15/03/07 17:08:50 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-10-10-95.eu-west-1.compute.internal:8041 15/03/07 17:08:50 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-10-10-98.eu-west-1.compute.internal:8041 15/03/07 17:08:50 INFO NMCallbackHandler: Container started : container_1424433163267_0065_01_000004 15/03/07 17:08:50 INFO NMCallbackHandler: Container started : container_1424433163267_0065_01_000002 15/03/07 17:08:50 INFO NMCallbackHandler: Container started : container_1424433163267_0065_01_000003 15/03/07 17:08:51 INFO RMCallbackHandler: Got response from RM for container request, completed containers=[ContainerStatus: [ContainerId: container_1424433163267_0065_01_000003, State: COMPLETE, Diagnostics: , ExitStatus: 0, ], ContainerStatus: [ContainerId: container_1424433163267_0065_01_000004, State: COMPLETE, Diagnostics: , ExitStatus: 0, ]].size() 15/03/07 17:08:51 INFO RMCallbackHandler: ContainerID=ContainerStatus: [ContainerId: container_1424433163267_0065_01_000003, State: COMPLETE, Diagnostics: , ExitStatus: 0, ].getContainerId(), state=ContainerStatus: [ContainerId: container_1424433163267_0065_01_000003, State: COMPLETE, Diagnostics: , ExitStatus: 0, ].getState(), exitStatus=0 15/03/07 17:08:51 INFO RMCallbackHandler: ContainerID=ContainerStatus: [ContainerId: container_1424433163267_0065_01_000004, State: COMPLETE, Diagnostics: , ExitStatus: 0, ].getContainerId(), state=ContainerStatus: [ContainerId: container_1424433163267_0065_01_000004, State: COMPLETE, Diagnostics: , ExitStatus: 0, ].getState(), exitStatus=0 15/03/07 17:08:51 INFO RMCallbackHandler: Got response from RM for container request, allocatedCnt=1 15/03/07 17:08:51 INFO YarnAMActor: Received ContainerRequestMessage 15/03/07 17:08:51 INFO YarnAMActor: Received ContainerRequestMessage 15/03/07 17:08:51 INFO AMRMClientAsyncActor: Received ContainerRequestMessage 15/03/07 17:08:51 INFO AMRMClientAsyncActor: creating ContainerRequest 15/03/07 17:08:51 INFO YarnAMActor: Received ContainerRequestMessage 15/03/07 17:08:51 INFO AMRMClientAsyncActor: Received ContainerRequestMessage 15/03/07 17:08:51 INFO AMRMClientAsyncActor: creating ContainerRequest 15/03/07 17:08:51 INFO YarnAMActor: Received ContainerRequestMessage 15/03/07 17:08:51 INFO AMRMClientAsyncActor: Received ContainerRequestMessage 15/03/07 17:08:51 INFO AMRMClientAsyncActor: creating ContainerRequest 15/03/07 17:08:51 INFO YarnAMActor: Received LaunchContainers 15/03/07 17:08:51 INFO AMRMClientAsyncActor: Received ContainerRequestMessage 15/03/07 17:08:51 INFO AMRMClientAsyncActor: creating ContainerRequest 15/03/07 17:08:51 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1424433163267_0065_01_000005 15/03/07 17:08:51 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-10-10-95.eu-west-1.compute.internal:8041 15/03/07 17:08:51 INFO NMCallbackHandler: Container started : container_1424433163267_0065_01_000005 15/03/07 17:08:52 INFO RMCallbackHandler: Got response from RM for container request, completed containers=[ContainerStatus: [ContainerId: container_1424433163267_0065_01_000002, State: COMPLETE, Diagnostics: , ExitStatus: 0, ], ContainerStatus: [ContainerId: container_1424433163267_0065_01_000005, State: COMPLETE, Diagnostics: , ExitStatus: 0, ]].size() 15/03/07 17:08:52 INFO RMCallbackHandler: ContainerID=ContainerStatus: [ContainerId: container_1424433163267_0065_01_000002, State: COMPLETE, Diagnostics: , ExitStatus: 0, ].getContainerId(), state=ContainerStatus: [ContainerId: container_1424433163267_0065_01_000002, State: COMPLETE, Diagnostics: , ExitStatus: 0, ].getState(), exitStatus=0 15/03/07 17:08:52 INFO RMCallbackHandler: ContainerID=ContainerStatus: [ContainerId: container_1424433163267_0065_01_000005, State: COMPLETE, Diagnostics: , ExitStatus: 0, ].getContainerId(), state=ContainerStatus: [ContainerId: container_1424433163267_0065_01_000005, State: COMPLETE, Diagnostics: , ExitStatus: 0, ].getState(), exitStatus=0 15/03/07 17:08:52 INFO YarnAMActor: Application completed. Stopping running containers 15/03/07 17:08:52 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-10-10-98.eu-west-1.compute.internal:8041 15/03/07 17:08:52 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-10-10-95.eu-west-1.compute.internal:8041 15/03/07 17:08:52 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-10-10-99.eu-west-1.compute.internal:8041 15/03/07 17:08:52 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-10-10-95.eu-west-1.compute.internal:8041 15/03/07 17:08:52 INFO AMRMClientAsyncActor: Received AMStatusMessage 15/03/07 17:08:52 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.