cjuexuan / mynote

237 stars 34 forks source link

大数据平台监控指标 #18

Open cjuexuan opened 8 years ago

cjuexuan commented 8 years ago

hadoop metrics2

监控的内容:

  1. yarn
  2. jvm
  3. rpc
  4. rpcdetailed
  5. metricssystem
  6. mapred
  7. dfs
  8. ugi

已经提供的:

Source : org.apache.hadoop.metrics2.source.JvmMerticsorg.apache.hadoop.metrics2.source.JvmMetricsInfo

其他相关

FSOpDurations : org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSOpDurations

fairscheduler-op-durations context :

  @Metric("Duration for a continuous scheduling run")  MutableRate continuousSchedulingRun;
  @Metric("Duration to handle a node update")  MutableRate nodeUpdateCall;
  @Metric("Duration for a update thread run")  MutableRate updateThreadRun;
  @Metric("Duration for an update call")  MutableRate updateCall;
  @Metric("Duration for a preempt call") MutableRate preemptCall;

QueueMetrics : org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics

yarn context :

  @Metric("# of apps submitted") MutableCounterInt appsSubmitted;
  @Metric("# of running apps") MutableGaugeInt appsRunning;
  @Metric("# of pending apps") MutableGaugeInt appsPending;
  @Metric("# of apps completed") MutableCounterInt appsCompleted;
  @Metric("# of apps killed") MutableCounterInt appsKilled;
  @Metric("# of apps failed") MutableCounterInt appsFailed;
  @Metric("Allocated memory in MB") MutableGaugeInt allocatedMB;
  @Metric("Allocated CPU in virtual cores") MutableGaugeInt allocatedVCores;
  @Metric("# of allocated containers") MutableGaugeInt allocatedContainers;
  @Metric("Aggregate # of allocated containers") MutableCounterLong aggregateContainersAllocated;
  @Metric("Aggregate # of released containers") MutableCounterLong aggregateContainersReleased;
  @Metric("Available memory in MB") MutableGaugeInt availableMB;
  @Metric("Available CPU in virtual cores") MutableGaugeInt availableVCores;
  @Metric("Pending memory allocation in MB") MutableGaugeInt pendingMB;
  @Metric("Pending CPU allocation in virtual cores") MutableGaugeInt pendingVCores;
  @Metric("# of pending containers") MutableGaugeInt pendingContainers;
  @Metric("# of reserved memory in MB") MutableGaugeInt reservedMB;
  @Metric("Reserved CPU in virtual cores") MutableGaugeInt reservedVCores;
  @Metric("# of reserved containers") MutableGaugeInt reservedContainers;
  @Metric("# of active users") MutableGaugeInt activeUsers;
  @Metric("# of active applications") MutableGaugeInt activeApplications;

FSQueueMetrics : org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics

yarn context :

  @Metric("Fair share of memory in MB") MutableGaugeInt fairShareMB;
  @Metric("Fair share of CPU in vcores") MutableGaugeInt fairShareVCores;
  @Metric("Steady fair share of memory in MB") MutableGaugeInt steadyFairShareMB;
  @Metric("Steady fair share of CPU in vcores") MutableGaugeInt steadyFairShareVCores;
  @Metric("Minimum share of memory in MB") MutableGaugeInt minShareMB;
  @Metric("Minimum share of CPU in vcores") MutableGaugeInt minShareVCores;
  @Metric("Maximum share of memory in MB") MutableGaugeInt maxShareMB;
  @Metric("Maximum share of CPU in vcores") MutableGaugeInt maxShareVCores;

MetricsSystemImpl : org.apache.hadoop.metrics2.impl.MetricsSystemImpl

metricssystem context :

  @Metric({"Snapshot", "Snapshot stats"}) MutableStat snapshotStat;
  @Metric({"Publish", "Publishing stats"}) MutableStat publishStat;
  @Metric("Dropped updates by all sinks") MutableCounterLong droppedPubAll;

Sink : org.apache.hadoop.metrics2.sink.GraphiteSinkorg.apache.hadoop.metrics2.sink.FileSink 以及org.apache.hadoop.metrics2.sink.AbstractGangliaSink

metricsSystem : org.apache.hadoop.metrics2.lib.DefaultMetricsSystem

自我实现:

  1. Source : org.apache.hadoop.metrics2.MetricsSource
  2. Sink : org.apache.hadoop.metrics2.MetricsSink
  3. MetricsSystem : org.apache.hadoop.metrics2.MetricsSystem

使用方式:

$HADOOP_HOME/etc/hadoop/hadoop-metrics2.properties中配置就可以

spark metrics

可以获取到的内容主要有:

  1. master
  2. applications
  3. worker
  4. executor
  5. driver

    实现的Sink有

    ConsoleSink : org.apache.spark.metrics.sink.ConsoleSink

    CSVSink : org.apache.spark.metrics.sink.CSVSink

    JmxSink : org.apache.spark.metrics.sink.JmxSink

    MetricsServlet : org.apache.spark.metrics.sink.MetricsServlet

    GraphiteSink : org.apache.spark.metrics.sink.GraphiteSink

    Slf4jSink : org.apache.spark.metrics.sink.Slf4jSink

    实现的Source有

    JvmSource : org.apache.spark.metrics.source.JvmSource

    ApplicationSource : org.apache.spark.deploy.master.ApplicationSource

  6. status
  7. runtime_ms
  8. cores

    BlockManagerSource : org.apache.spark.storage.BlockManagerSource

  9. maxMem_MB
  10. remainingMem_MB
  11. memUsed_MB
  12. diskSpaceUsed_MB

    DAGSchedulerSource : org.apache.spark.scheduler.DAGSchedulerSource

  13. failedStages
  14. runningStages
  15. waitingStages
  16. allJobs
  17. activeJobs

    ExecutorAllocationManagerSource : org.apache.spark.ExecutorAllocationManagerSource

  18. numberExecutorsToAdd
  19. numberExecutorsPendingToRemove
  20. numberAllExecutors
  21. numberTargetExecutors
  22. numberMaxNeededExecutors

    ExecutorSource : org.apache.spark.executor.ExecutorSource

  23. activeTasks
  24. completeTasks
  25. currentPool_size
  26. maxPool_size
  27. read_bytes
  28. write_bytes
  29. read_ops
  30. largeRead_ops
  31. write_ops

    MasterSource : org.apache.spark.deploy.master.MasterSource

  32. workers
  33. aliveWorkers
  34. apps
  35. waitingApps

    MesosClusterSchedulerSource : org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSource

  36. waitingDrivers
  37. launchedDrivers
  38. retryDrivers

    WorkerSource : org.apache.spark.deploy.worker.WorkerSource

  39. executors
  40. coresUsed
  41. memUsed_MB
  42. coresFree
  43. memFree_MB