Closed allenporter closed 3 years ago
From executor, its failing to talk to the data-mover
job which is port 51515.
Errors:[]*models.Error{(*models.Error){Cause:map[string]interface{}{"cause":map[string]interface{}{"message":"unable to get repository parameters: error running http request: Get \"https://10.101.221.60:51515/api/v1/repo/parameters\": round-trip error: can't find certificate matching SHA256 fingerprint \"XXXX\" (server had [YYYY])"}, "function":"kasten.io/k10/kio/kopiaclient.OpenRepository", "linenumber":json.Number("108"), "message":"Failed to open Kopia repository"}, Fields:[]*models.Field{}, Message:"Job failed to be executed", Retriable:false}}, GroupIndex:1, ID:strfmt.UUID("ac1d19d1-ea98-11eb-a46a-b69c932c624b"), Manifest:models.ItemID("9a982304-ea98-11eb-b5b8-3eaa9d356af5"),
Last run succeeded. This is fairly flaky -- plus the rules are noisy.
Rules at https://docs.kasten.io/latest/operating/monitoring.html#generating-alerts -- problem is that it alerts on every backup failure, rather than if the jobs are within policy or not.
The metric dashboardbff_compliance_count
seems like it may be more useful since it tracks the # of jobs out of compliance.
Going with benji instead of k10. #274