datavane / datasophon

The next generation of cloud-native big data management expert , Aims to help users rapidly build stable, efficient, and scalable cloud-native platforms for big data.
https://datasophon.github.io/datasophon-website/
Apache License 2.0
1.01k stars 360 forks source link

[Bug] Yarn resourcemanagergc Alarms But the rm gc curve does not exceed the threshold #563

Open mengbaba3316 opened 1 month ago

mengbaba3316 commented 1 month ago

Search before asking

What happened

Version ddp1.2.1 Yarn resourcemanagergc Alarms But the rm gc curve does not exceed the threshold b12d06b87586bf22c357c51cec4f8d0 I suspect that the UI page status update is not timely, and I feel that other components will also have this problem

What you expected to happen

Hopefully, we can resolve this issue and check the other components

How to reproduce

When the threshold is exceeded and the gc time of the restart service decreases, this alarm is occasionally displayed

Anything else

No response

Version

dev

Are you willing to submit PR?

Code of Conduct

datasophon commented 1 month ago

The ResourceManagerGC indicator of resourcemanager is incorrect, you can turn it off

hawk9821 commented 1 month ago

应该是告警时效性的问题, 在告警发触发的时候产生了告警, 告警记录的状态并没有更新导致的。 重启yarn 服务告警就没有了 告警的计算逻辑应该是没问题的 。
我发现从 alertmanager 发送的告警信息 status 都是 firing , 没有 resolved , 导致告警记录的状态不会更新 image @datasophon