atc0005 / check-rsat

Go-based tooling to monitor Red Hat Satellite systems; NOT affiliated with or endorsed by Red Hat, Inc.
MIT License
0 stars 0 forks source link

Add support for detecting tasks in a "paused" state #214

Open atc0005 opened 5 months ago

atc0005 commented 5 months ago

Ran into a situation where connections to the host (self-referencing) were refused and the affected task went into a "paused" state. Looking at the specific task I see the option to "Resume" the task.

E.g.,

image

Error text:

Failed to open TCP connection to rsat.example.com:443 (Connection refused - connect(2) for "rsat.example.com" port 443)

Ideally support would be available to detect tasks stuck in this state and surface them.

Even with these stuck tasks, scheduled sync plans continue to "run" therefore not triggering the existing check_rsat_sync_plans "there is a problem" logic.

Future note to self https://github.com/atc0005/check-rsat/blob/b08f17fa3badf68e8c9d3430e2d117ce6252d1d5/internal/rsat/syncplans.go#L183-L210
atc0005 commented 4 months ago

https://rsat.example.com/foreman_tasks/api/tasks?search=state=paused

Example "pretty" payload:

{
  "total": 657825,
  "subtotal": 0,
  "page": 1,
  "per_page": 20,
  "sort": {
    "by": "started_at",
    "order": "DESC"
  },
  "results": []
}

The bit that we're specifically focused on:

"subtotal": 0,