eandersson / amqpstorm

Thread-safe Python RabbitMQ Client & Management library
https://www.amqpstorm.io/
MIT License
186 stars 36 forks source link

potential infinite loop ? #38

Closed cp2587 closed 7 years ago

cp2587 commented 7 years ago

Hello,

I am using your library in conjunction with celery and i regularly encounter a problem: When asking celery to restart its worker (emitting a SIGTERM), the process is blocked and celery doesn't want to restart. Celery maintains a parent thread that runs child threads where tasks are executed. Amqpstorm's thread is also run from this parent thread. The processes in htop look like this:

screenshot from 2017-04-11 10-34-26

Here process 13880 & 13803 are related to amqpstorm. 13803 is the inbound_thread while 13880 is the heartbeat timer.

After some investigation, i found out that killing the thread responsible for the heartbeat timer allows celery to gracefuly restart... This lead me to think that the timer could possibly create a deadlock.

I have created a pull request based on this assumption and will try this branch on my setup: Can you tell me what you think of it ? Do you see any other possible explanations for the error i see ?

codecov-io commented 7 years ago

Codecov Report

Merging #38 into master will not change coverage. The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #38   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          28     28           
  Lines        1700   1702    +2     
  Branches      253    254    +1     
=====================================
+ Hits         1700   1702    +2
Impacted Files Coverage Δ
amqpstorm/heartbeat.py 100% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 05136a6...d2a2f8a. Read the comment docs.

eandersson commented 7 years ago

At first glance this looks good to me.

eandersson commented 7 years ago

Did the issue go away after you applied this patch?

cp2587 commented 7 years ago

No it did not solve my problem... The search continues... The weird thing is that celery correctly stop once i kill the heartbeat thread.. Do you have any suggestion ? Also can you put a new version and update so we can use pip to install the latest package ?

eandersson commented 7 years ago

If this is a deadlock in the Heartbeat code, the only thing that I could think of that could cause this would be send heartbeat. https://github.com/eandersson/amqpstorm/blob/b1e82325f29758e222782a9a9861535b4b388eaa/amqpstorm/heartbeat.py#L84

This has two locks interacting with each-other.

I'll see if I could push a new build out soonTM.

eandersson commented 7 years ago

Could you try to capture the signal and gracefully close the connection? Something like

def signal_handler(self, signal, frame):
    self.connection.close()
    sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)

Also, would it be possible for you to reproduce this in an example?

eandersson commented 7 years ago

I moved the send_heartbeat call outside of the internal heartbeat lock. This should hopefully prevent any potential deadlock. https://github.com/eandersson/amqpstorm/commit/30675f8123855281ed0f8f22cfe9b5a3f3a0c0f4

Can you test it and let me know?

cp2587 commented 7 years ago

Thank you. I am not able to reproduce the problem, right now, i am not even sure the problem comes from amqpstorm anymore. I will try to have a python stacktrace during the next deadlock. I wil keep you updated !

eandersson commented 7 years ago

I actually found an error in the heartbeat logic. https://github.com/eandersson/amqpstorm/commit/e0cb97ac14fa0dae63fe23ff34f34b01f30bb5b7

I don't think this would explain the issue you experienced, but at the very least it wasn't setting that flag to it's intended value.

eandersson commented 7 years ago

Pushed v2.2.2 to pypi.