allenporter / k8s-gitops

Flux/Gitops managed k8s cluster
33 stars 1 forks source link

Backups failing with `Failed to stream backup to remote kopia repository on Kopia API` #258

Closed allenporter closed 3 years ago

allenporter commented 3 years ago
status:
  state: Failed
  startTime: 2021-07-22T01:57:20Z
  endTime: 2021-07-22T02:26:05Z
  restorePoint:
    name: ""
  error:
    cause: '{"fields":[{"name":"message","value":"{\"message\":\"Failed to move data
      from
      source\",\"function\":\"kasten.io/k10/kio/kanister/function.(*moverBackupToServerFunc).Exec\",\"linenumber\":134,\"fields\":[{\"name\":\"dataSource\",\"value\":\"http://10.106.88.156:8000/v0/backup\"}],\"cause\":{\"message\":\"Failed
      to stream backup to remote kopia repository on Kopia API
allenporter commented 3 years ago

From executor, its failing to talk to the data-mover job which is port 51515.

Errors:[]*models.Error{(*models.Error){Cause:map[string]interface{}{"cause":map[string]interface{}{"message":"unable to get repository parameters: error running http request: Get \"https://10.101.221.60:51515/api/v1/repo/parameters\": round-trip error: can't find certificate matching SHA256 fingerprint \"XXXX\" (server had [YYYY])"}, "function":"kasten.io/k10/kio/kopiaclient.OpenRepository", "linenumber":json.Number("108"), "message":"Failed to open Kopia repository"}, Fields:[]*models.Field{}, Message:"Job failed to be executed", Retriable:false}}, GroupIndex:1, ID:strfmt.UUID("ac1d19d1-ea98-11eb-a46a-b69c932c624b"), Manifest:models.ItemID("9a982304-ea98-11eb-b5b8-3eaa9d356af5"), 
allenporter commented 3 years ago

Last run succeeded. This is fairly flaky -- plus the rules are noisy.

allenporter commented 3 years ago

Rules at https://docs.kasten.io/latest/operating/monitoring.html#generating-alerts -- problem is that it alerts on every backup failure, rather than if the jobs are within policy or not.

The metric dashboardbff_compliance_count seems like it may be more useful since it tracks the # of jobs out of compliance.

allenporter commented 3 years ago

Going with benji instead of k10. #274