Closed cp2587 closed 7 years ago
Merging #38 into master will not change coverage. The diff coverage is
100%
.
@@ Coverage Diff @@
## master #38 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 28 28
Lines 1700 1702 +2
Branches 253 254 +1
=====================================
+ Hits 1700 1702 +2
Impacted Files | Coverage Δ | |
---|---|---|
amqpstorm/heartbeat.py | 100% <100%> (ø) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 05136a6...d2a2f8a. Read the comment docs.
At first glance this looks good to me.
Did the issue go away after you applied this patch?
No it did not solve my problem... The search continues... The weird thing is that celery correctly stop once i kill the heartbeat thread.. Do you have any suggestion ? Also can you put a new version and update so we can use pip to install the latest package ?
If this is a deadlock in the Heartbeat code, the only thing that I could think of that could cause this would be send heartbeat. https://github.com/eandersson/amqpstorm/blob/b1e82325f29758e222782a9a9861535b4b388eaa/amqpstorm/heartbeat.py#L84
This has two locks interacting with each-other.
I'll see if I could push a new build out soonTM.
Could you try to capture the signal and gracefully close the connection? Something like
def signal_handler(self, signal, frame):
self.connection.close()
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
Also, would it be possible for you to reproduce this in an example?
I moved the send_heartbeat call outside of the internal heartbeat lock. This should hopefully prevent any potential deadlock. https://github.com/eandersson/amqpstorm/commit/30675f8123855281ed0f8f22cfe9b5a3f3a0c0f4
Can you test it and let me know?
Thank you. I am not able to reproduce the problem, right now, i am not even sure the problem comes from amqpstorm anymore. I will try to have a python stacktrace during the next deadlock. I wil keep you updated !
I actually found an error in the heartbeat logic. https://github.com/eandersson/amqpstorm/commit/e0cb97ac14fa0dae63fe23ff34f34b01f30bb5b7
I don't think this would explain the issue you experienced, but at the very least it wasn't setting that flag to it's intended value.
Pushed v2.2.2 to pypi.
Hello,
I am using your library in conjunction with celery and i regularly encounter a problem: When asking celery to restart its worker (emitting a SIGTERM), the process is blocked and celery doesn't want to restart. Celery maintains a parent thread that runs child threads where tasks are executed. Amqpstorm's thread is also run from this parent thread. The processes in htop look like this:
Here process 13880 & 13803 are related to amqpstorm. 13803 is the inbound_thread while 13880 is the heartbeat timer.
After some investigation, i found out that killing the thread responsible for the heartbeat timer allows celery to gracefuly restart... This lead me to think that the timer could possibly create a deadlock.
I have created a pull request based on this assumption and will try this branch on my setup: Can you tell me what you think of it ? Do you see any other possible explanations for the error i see ?