codership / galera-manager-support

Galera Manager Support Repository
7 stars 2 forks source link

Recover Cluster job keeps starting #83

Open arenner-git opened 9 months ago

arenner-git commented 9 months ago

Hi,

due to some maintenance on our infrastructure we had to shutdown our cluster nodes. Now after booting the nodes we could recover the cluster without any problems, everything is working fine (had some troubles with one node, but we resolved them). When looking into Galera Manager I saw, that there are 2 start/recover cluster jobs coming up now every 4 minutes, which is quite annoying. These jobs always fail ("failed to get the most advanced node"). I tried to figure out how to delete this tasks (or jobs), but could not find anything.

In the logs (and via gmc command) I can see the 2 failing tasks:

{"_context":{},"channel-type":"app","file":"/go/pkg/internal/queue/poll.go:114","func":"github.com/codership/galera-manager/pkg/internal/queue.(*Poll).getQueuedJobWithBackoff.func1","level":"info","msg":"Pulling task from queue","queuedjob_id":"43ef6c97-5d89-4edb-8552-0600aa6edc38","task_id":"33","time":"2024-01-08T08:52:35+01:00"} {"_context":{},"channel-type":"app","file":"/go/pkg/internal/queue/poll.go:114","func":"github.com/codership/galera-manager/pkg/internal/queue.(*Poll).getQueuedJobWithBackoff.func1","level":"info","msg":"Pulling task from queue","queuedjob_id":"90bf2d5a-31c4-4a98-8cd3-0f7956d1e02e","task_id":"27","time":"2024-01-08T08:52:35+01:00"} {"channel-type":"app","file":"/go/pkg/internal/executor/processor.go:136","func":"github.com/codership/galera-manager/pkg/internal/executor.(*Processor).Process","level":"info","msg":"Running task","task_id":"33","time":"2024-01-08T08:52:35+01:00"} {"channel-type":"app","file":"/go/pkg/internal/executor/processor.go:136","func":"github.com/codership/galera-manager/pkg/internal/executor.(*Processor).Process","level":"info","msg":"Running task","task_id":"27","time":"2024-01-08T08:52:35+01:00"} {"channel-type":"app","error":null,"file":"/go/pkg/internal/executor/processor.go:80","func":"github.com/codership/galera-manager/pkg/internal/executor.(*Processor).Start","level":"error","msg":"Task process attempt failed (will retry)","task_id":"33","time":"2024-01-08T08:53:06+01:00"} {"channel-type":"app","error":null,"file":"/go/pkg/internal/executor/processor.go:80","func":"github.com/codership/galera-manager/pkg/internal/executor.(*Processor).Start","level":"error","msg":"Task process attempt failed (will retry)","task_id":"27","time":"2024-01-08T08:53:06+01:00"}

Also via the gmc command on the command line i can find the 2

Anyone got a clue how to delete jobs and/or tasks?

Thanks in advance!

mcordes92 commented 8 months ago

I have the same

arenner-git commented 8 months ago

I have the same

Hi @mcordes92, I solved it by stopping all nodes, one after another via the galera manager UI. The next time the recover job ran without problems and succeeded.

Nevertheless it would be really good to have an option to cancel jobs.

mcordes92 commented 8 months ago

I still have the problem even after stopping all nodes and doing a new restore

arenner-git commented 8 months ago

I still have the problem even after stopping all nodes and doing a new restore

Hi, did you stop all nodes and then click on "Recover cluster" (wrong) or did you stop all nodes and then just wait for the running recover job to start again (right)?