apache / rocketmq

Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.
https://rocketmq.apache.org/
Apache License 2.0
21.19k stars 11.67k forks source link

Push client metrics to a common statistics server instead of broker call client by itself to avoid extra RPC calls. #3662

Closed humkum closed 1 year ago

humkum commented 2 years ago

The issue tracker is ONLY used for bug report(feature request need to follow RIP process). Keep in mind, please check whether there is an existing same report before your raise a new one.

Alternately (especially if your communication is not a bug report), you can send mail to our mailing lists. We welcome any friendly suggestions, bug fixes, collaboration and other improvements.

Please ensure that your bug report is clear and that it is complete. Otherwise, we may be unable to understand it or to reproduce it, either of which would prevent us from fixing the bug. We strongly recommend the report(bug report or feature request) could include some hints as the following:

BUG REPORT

  1. Please describe the issue you observed:
  1. Please tell us about your environment:

  2. Other information (e.g. detailed explanation, logs, related issues, suggestions how to fix, etc):

FEATURE REQUEST

  1. Please describe the feature you are requesting.

    Hello. We found that the client's runtime metrics are collected on the client, but when the rocketmq-exporter obtains these metrics, it first requests the broker, and then the broker uses the callConsumer() method to request the client to obtain them. There are two problems are currently found:

    1. If the version of the client is behind the server, the broker may fail to request runtime metrics from the client, resulting in the rocketmq-exporter not being able to collect the runtime metrics of the consumer.
    2. Every time the consumer's runtime metrics are obtained, the broker requests the client, which will inevitably cause additional RPC calls. Based on the above problems, we consider collecting client metrics on the client side, and then push the metrics to the same statistics node by actively pushing. At the same time, in order to make the client runtime metrics more accurate, we consider using the Metrics tool to count these metrics. In addition, we plan to design this module as a plug-in, which users can switch on and off.
  2. Provide any additional detail on your proposed use case for this feature.

  3. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

  4. If there are some sub-tasks using -[] for each subtask and create a corresponding issue to map to the sub task:

MatrixHB commented 2 years ago

When you said 'a common statistics server', do you mean using broker or using extra node to store the statistics data?

Is it wasteful to introduce at least two nodes just for storage of runtime statistics data?

ni-ze commented 2 years ago

sub-task links to nothing, look forward to provide more details. Storing statistics data with extra node will ause a waste of resources. And not everyone has a demand of collect all client statistics.

odbozhou commented 2 years ago

I think we should start from the point of view of the problem?

Does the rpc call to obtain client statistics really have any impact?

If the client's statistical data is inaccurate, can it be solved by optimizing the statistical code? The introduction of the new metrics tool may not necessarily solve the problem better.

guyinyou commented 2 years ago

Starting from the problem, it should be because different uses have different requirements. The rpc call method ensures that the data obtained by each call is useful. But if you switch to real-time push, there may be a lot of ineffective waste of resources. Unless there is a real-time computing scenario demand, maybe there are other better solutions?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 3 days since being marked as stale.