abcxyz / jvs

Apache License 2.0
8 stars 0 forks source link

feature: update monitoring dashboard and add basic alerting policy #230

Closed sailorlqh closed 1 year ago

capri-xiyue commented 1 year ago

@sailorlqh Did you create the alert policy and notification channel in the local via terraform apply, and try to trigger it to verify whether it works?

sailorlqh commented 1 year ago

@sailorlqh Did you create the alert policy and notification channel in the local via terraform apply, and try to trigger it to verify whether it works?

Yes, I have tested it, and it works.

sqin2019 commented 1 year ago

Have we decide what alerts we want? currently I only see error count for api/publicKey/certRotation and latency for ui. Should we have error (count and rate?) and latency alerts for all services @yolocs @capri-xiyue @sailorlqh ? Also can we make some of the fields inputs like thresholds and aligner. especially thresholds, we might need to adjust thresholds once we have traffic.

sailorlqh commented 1 year ago

Have we decide what alerts we want? currently I only see error count for api/publicKey/certRotation and latency for ui. Should we have error (count and rate?) and latency alerts for all services @yolocs @capri-xiyue @sailorlqh ? Also can we make some of the fields inputs like thresholds and aligner. especially thresholds, we might need to adjust thresholds once we have traffic.

To my current understanding, this is the initial alerts. Since we will have a prober service after, and at that time we will have a more reasonable and through out alert policy.

capri-xiyue commented 1 year ago

adjust thresholds once we have traffic.

I think it makes sense to have custimized thresholds, for example, we don't want to use the same threadhold for dev and prod env, but we can still have some default thresholds.

sailorlqh commented 1 year ago

I think it makes sense to have custimized thresholds, for example, we don't want to use the same threadhold for dev and prod env, but we can still have some default thresholds.

Added customized thresholds.

sqin2019 commented 1 year ago

Have we decide what alerts we want? currently I only see error count for api/publicKey/certRotation and latency for ui. Should we have error (count and rate?) and latency alerts for all services @yolocs @capri-xiyue @sailorlqh ? Also can we make some of the fields inputs like thresholds and aligner. especially thresholds, we might need to adjust thresholds once we have traffic.

To my current understanding, this is the initial alerts. Since we will have a prober service after, and at that time we will have a more reasonable and through out alert policy.

With prober, we can have more reasonable thresholds, still we could be missing high latency on backends and high error count on frontend. Should we add both error and latency alerts for all services?

sailorlqh commented 1 year ago

Have we decide what alerts we want? currently I only see error count for api/publicKey/certRotation and latency for ui. Should we have error (count and rate?) and latency alerts for all services @yolocs @capri-xiyue @sailorlqh ? Also can we make some of the fields inputs like thresholds and aligner. especially thresholds, we might need to adjust thresholds once we have traffic.

To my current understanding, this is the initial alerts. Since we will have a prober service after, and at that time we will have a more reasonable and through out alert policy.

With prober, we can have more reasonable thresholds, still we could be missing high latency on backends and high error count on frontend. Should we add both error and latency alerts for all services?

Added more policies.