Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.87k stars 2.94k forks source link

Fix metric reporting error #18689

Open suixinpr opened 2 months ago

suixinpr commented 2 months ago

What changes are proposed in this pull request?

Fix metric reporting error.

When the worker reports metrics to the master, the metric type reported to the master should be consistent with the instanceType

Why are the changes needed?

When I was reading data, I found that some metrics in the master were inaccurate.

For example, the "Remote Alluxio Read" on the master homepage was less than the amount of data read by the client, and also less than the "Bytes Read Remotely" on the worker page.

By adding logs, it can be found that the "Bytes Read Remotely" of the worker is correct. However, after the woker sent it to the master, some alluxio.grpc.Metric records were skipped.

alluxio.grpc.Metric is obtained from the worker's alluxio.metrics.MetricsSystem#reportMetrics and received and processed by the master's alluxio.master.metrics.MetricsStore#putReportedMetrics.

When processing reportMetrics, in alluxio.metrics.MetricsSystem#SHOULD_REPORT_METRICS, there are actually both InstanceType.CLIENT and InstanceType.Worker metrics. They will be reported to the master by calling alluxio.metrics.MetricsSystem#reportWorkerMetrics and alluxio.metrics.MetricsSystem#reportClientMetrics interfaces by different worker threads. However, in the master, putReportedMetrics will only process the records of InstanceType.Worker and ignore the records of InstanceType.CLIENT.

Since Counter only sends diff, if a Worker's metric is sent by reportClientMetrics, the master's record will record one less data.

Does this PR introduce any user facing changes?

none

alluxio-bot commented 2 months ago

Thank you for your pull request. In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement (CLA). It's all electronic and will take just a few minutes. Please download CLA form here, sign, and e-mail back to cla@alluxio.org

suixinpr commented 2 months ago

OK, I have signed the CLA and e-mail back to cla@alluxio.org