Open suixinpr opened 2 months ago
Thank you for your pull request. In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement (CLA). It's all electronic and will take just a few minutes. Please download CLA form here, sign, and e-mail back to cla@alluxio.org
OK, I have signed the CLA and e-mail back to cla@alluxio.org
What changes are proposed in this pull request?
Fix metric reporting error.
When the worker reports metrics to the master, the metric type reported to the master should be consistent with the instanceType
Why are the changes needed?
When I was reading data, I found that some metrics in the master were inaccurate.
For example, the "Remote Alluxio Read" on the master homepage was less than the amount of data read by the client, and also less than the "Bytes Read Remotely" on the worker page.
By adding logs, it can be found that the "Bytes Read Remotely" of the worker is correct. However, after the woker sent it to the master, some
alluxio.grpc.Metric
records were skipped.alluxio.grpc.Metric
is obtained from the worker'salluxio.metrics.MetricsSystem#reportMetrics
and received and processed by the master'salluxio.master.metrics.MetricsStore#putReportedMetrics
.When processing reportMetrics, in
alluxio.metrics.MetricsSystem#SHOULD_REPORT_METRICS
, there are actually bothInstanceType.CLIENT
andInstanceType.Worker
metrics. They will be reported to the master by callingalluxio.metrics.MetricsSystem#reportWorkerMetrics
andalluxio.metrics.MetricsSystem#reportClientMetrics
interfaces by different worker threads. However, in the master, putReportedMetrics will only process the records ofInstanceType.Worker
and ignore the records ofInstanceType.CLIENT
.Since Counter only sends diff, if a Worker's metric is sent by reportClientMetrics, the master's record will record one less data.
Does this PR introduce any user facing changes?
none