Closed sailorlqh closed 1 year ago
@sailorlqh Did you create the alert policy and notification channel in the local via terraform apply, and try to trigger it to verify whether it works?
Yes, I have tested it, and it works.
Have we decide what alerts we want? currently I only see error count for api/publicKey/certRotation and latency for ui. Should we have error (count and rate?) and latency alerts for all services @yolocs @capri-xiyue @sailorlqh ? Also can we make some of the fields inputs like thresholds and aligner. especially thresholds, we might need to adjust thresholds once we have traffic.
Have we decide what alerts we want? currently I only see error count for api/publicKey/certRotation and latency for ui. Should we have error (count and rate?) and latency alerts for all services @yolocs @capri-xiyue @sailorlqh ? Also can we make some of the fields inputs like thresholds and aligner. especially thresholds, we might need to adjust thresholds once we have traffic.
To my current understanding, this is the initial alerts. Since we will have a prober service after, and at that time we will have a more reasonable and through out alert policy.
adjust thresholds once we have traffic.
I think it makes sense to have custimized thresholds, for example, we don't want to use the same threadhold for dev and prod env, but we can still have some default thresholds.
I think it makes sense to have custimized thresholds, for example, we don't want to use the same threadhold for dev and prod env, but we can still have some default thresholds.
Added customized thresholds.
Have we decide what alerts we want? currently I only see error count for api/publicKey/certRotation and latency for ui. Should we have error (count and rate?) and latency alerts for all services @yolocs @capri-xiyue @sailorlqh ? Also can we make some of the fields inputs like thresholds and aligner. especially thresholds, we might need to adjust thresholds once we have traffic.
To my current understanding, this is the initial alerts. Since we will have a prober service after, and at that time we will have a more reasonable and through out alert policy.
With prober, we can have more reasonable thresholds, still we could be missing high latency on backends and high error count on frontend. Should we add both error and latency alerts for all services?
Have we decide what alerts we want? currently I only see error count for api/publicKey/certRotation and latency for ui. Should we have error (count and rate?) and latency alerts for all services @yolocs @capri-xiyue @sailorlqh ? Also can we make some of the fields inputs like thresholds and aligner. especially thresholds, we might need to adjust thresholds once we have traffic.
To my current understanding, this is the initial alerts. Since we will have a prober service after, and at that time we will have a more reasonable and through out alert policy.
With prober, we can have more reasonable thresholds, still we could be missing high latency on backends and high error count on frontend. Should we add both error and latency alerts for all services?
Added more policies.
@sailorlqh Did you create the alert policy and notification channel in the local via terraform apply, and try to trigger it to verify whether it works?