RichardKnop / machinery

Machinery is an asynchronous task queue/job queue based on distributed message passing.
Mozilla Public License 2.0
7.55k stars 916 forks source link

Finding whether the worker where a task is running is alive or not #593

Open karthikcru opened 4 years ago

karthikcru commented 4 years ago

Also lets say when a worker crashes, due to a non graceful shutdown all the tasks. The tasks in the backend would still be in started state.

The asyncResult is still tracking a task which is STARTED state and the worker running the task has crashed

Is there a way to get all the workers with ID, so in the server we can look at tasks which are in STARTED state but not running on any of the available workers. Server and workers are not connected , so this must not be possible.

The workers and server are async and there is no way to find these tasks which will forever be in STARTED state because the worker went through a non graceful shutdown

Steps to reproduce:

Server object is server and one worker is running

  1. Create tasks , create a group and put send group to server
  2. Keep listening to async result and as well as look at the backend
group, _ := tasks.NewGroup(&signature, &signatureUpdate)
    asyncResult, err := server.SendGroup(group, 0) //The second parameter specifies the number of concurrent sending tasks. 0 means unlimited.

    for _, v := range asyncResult {
        for v.GetState().IsCompleted() == false {
            fmt.Println("async status", v.GetState())
            time.Sleep(10000 * time.Millisecond)
            state,_ := (server.GetBackend().GetState(runID.String()))
            fmt.Println("state from backend",  state)
  1. Send multiple TERM to worker to create a non graceful shutdown
  2. The loop never ends The following output is repeated forever state from backend &{d6873b08-b2dd-4c70-7dfd-20e5c285e58c RunJob STARTED [] 2020-08-21 10:21:54.130365 +0000 UTC 0} async status &{d6873b08-b2dd-4c70-7dfd-20e5c285e58c RunJob STARTED [] 2020-08-21 10:21:54.130365 +0000 UTC 0}

Can we add a task last alive timestamp in the task , a simple time tick so we know whether a running task is actually running or just an entry in the backend without a running task. Liveness probe for running tasks

miaochiahao commented 3 years ago

Same problem. Any workable solution?

aadog commented 3 years ago


aadog commented 3 years ago

这不是什么大问题,关键在于他没法标记 重新入队,以及元数据丢失,更糟糕的情况会发生在服务端被关闭