LinkedInAttic / white-elephant

Hadoop log aggregator and dashboard
Other
191 stars 63 forks source link

Scheduling, Capacity Planning and Billing Features #11

Closed emmetts closed 11 years ago

emmetts commented 11 years ago

Hi Matt,

I read this article about White Elephant from "http://engineering.linkedin.com/hadoop/white-elephant-hadoop-tool-you-never-knew-you-needed" and supposed White Elephant was able to manage Hadoop's scheduling, capacity planning and billing. After installing successfully, I did not find these features, but could only analyze cluster usage from log. How can I open scheduling, capacity planning and billing features on White Elephant?

Thanks

matthayes commented 11 years ago

It doesn't manage scheduling but makes information about cluster utilization easily accessible, which would then help with scheduling. There are many workflow schedulers out there, e.g. Azkaban and Oozie. Typically workflows will be scheduled to run at regular intervals, e.g. once a day or once a week. By visualizing usage per day you can determine where are points of low utilization where a new workflow could be scheduled. Seeing aggregate usage of the cluster helps with capacity planning. You can also access the total task hours per user. If you are in an organization where it is important to break down the costs of a Hadoop cluster per team this would be useful.