cenkalti / kuyruk

⚙️ Simple task queue for Python
https://kuyruk.readthedocs.org/
MIT License
231 stars 17 forks source link

Kuyruk stopped working after X processes #73

Closed nathan30 closed 3 years ago

nathan30 commented 3 years ago

Hi,

If I send a lot of process in my Kuyruk worker, at some points it will block. There is no error in the kuyruk command /home/nathan/miniconda3/envs/OC/bin/kuyruk --app src.main.OCforMaarch worker but it doesn't take new job

I check the RABBITMQ logs, no errors at all

Do you have any ideas where I can check to find the issue ?

thanks

cenkalti commented 3 years ago

Do you have access to RabbitMQ management interface? You can see the queue and consumer status from there.

If you think that your worker is stuck, you can send SIGUSR1 signal to force worker to print its current stack trace. https://kuyruk.readthedocs.io/en/latest/worker.html

nathan30 commented 3 years ago

Hi,

Sorry for the delayed response. I just try with a blocked Kuyruk service with the following command :

kill -10 WORKER_PID

I also tried kill -USR1 WORKER_PID

But I didn't have any return, is that normal ?

cenkalti commented 3 years ago

Maybe the output is not printed because of output buffering. Can you try running kill -USR1 WORKER_PID several times please?

nathan30 commented 3 years ago

Hi,

Tried today to run the cmd a several times. The process sin't stopping and return me nothing :/

cenkalti commented 3 years ago

The code for printing the stacktrace is here: https://github.com/cenkalti/kuyruk/blob/971e435ded43e6dca77718673b706b90464dc7f5/kuyruk/worker.py#L100

I am not sure in which circumstances a process may refuse to respond to a signal. Make sure that there is no handler overriding this after running the process or, you can copy the same logic into your application and use a different signal to make sure that it does not conflict with any other signal.

nathan30 commented 3 years ago

Hi,

I finally have something while running kill -USR1 but I don't really know what it means : https://pastebin.com/NjKq4yig

Thanks

cenkalti commented 3 years ago

Hi @nathan30 .

If you look at the stack trace, Kuyruk worker is executing the task:

Jul  8 11:14:31 hdvapp161 service.sh[1764]:   File "/usr/local/lib/python3.7/dist-packages/kuyruk/task.py", line 179, in apply
Jul  8 11:14:31 hdvapp161 service.sh[1764]:     return self.f(*args, **kwargs)

Inside the task, there is your application code:

Jul  8 11:14:31 hdvapp161 service.sh[1764]:   File "/opt/maarch/OpenCapture/src/classes/WebServices.py", line 356, in insert_attachment_from_mail
Jul  8 11:14:31 hdvapp161 service.sh[1764]:     res = requests.post(self.baseUrl + 'attachments', auth=self.auth, data=json.dumps(data), headers={'Connection': 'close', 'Content-Type': 'application/json'})

You are making a HTTP POST request to a server. More below the stack trace, there is call to read server response:

Jul  8 11:14:31 hdvapp161 service.sh[1764]:   File "/usr/lib/python3.7/http/client.py", line 310, in begin
Jul  8 11:14:31 hdvapp161 service.sh[1764]:     version, status, reason = self._read_status()

In conclusion, your code cannot read the response from the server, that's where your worker seems to be stuck at. Because this issue is not from Kuyruk but your application code, I am closing this issue.

As a good practice, always specify timeouts when making requests: https://docs.python-requests.org/en/master/user/advanced/#timeouts

You can also specify a max_run_time argument when defining your task. https://kuyruk.readthedocs.io/en/latest/api.html#kuyruk.Task

Please let me know if you need something else from me.