Open bblanchon opened 3 years ago
did you also check the master branch? how many retries did you found not causing an infinite loop?
did you also check the master branch?
I didn't check out the master branch to reproduce the issue, but looking at the code suggests that the default value is still None
.
Moreover, the changelog doesn't mention any change in that regard.
how many retries did you found not causing an infinite loop?
I'm sorry; I don't understand this question.
I think what Asif means is: What do you consider a reasonable default?
As I implied in my original message, I think 10 is a reasonable default, but it could be lower. For example, I use the value 3 in my project, which means a total of 4 connection attempts.
As I implied in my original message, I think 10 is a reasonable default, but it could be lower. For example, I use the value 3 in my project, which means a total of 4 connection attempts.
are you up for updating docs, & set a reasonable max_retry = 5 & do the related code changes?
Thank you very much for the offer, but I don't think I have the knowledge required to write this patch.
Hi @auvipy,
I had prepared some code this PR to set the default max_retries to 5. However it resulted in difficult to explain behaviour in the documentation. To me it seems the transport_option/max_retries applies to both the connect and for the transport layer. The default value I made only applied to the connection (set in _ensure_connection). The other thing which worried me is the implicit change which could cause in case an implementation relies on the old default of indefinite.
While investigation I found #1458 which makes sure the timeout is set to the already available connection_timeout (see change) or the transport_option connect_retries_timeout (added by #1599). The connection_timeout already defaulted to 5 sec. #1458 is released in 5.2.3 and prevents an infinite loop (though on different parameter) by default.
I see little use of still implementing this, mainly because it changes the default behaviour and complicates the explanation. Since #1458 there is already another default mechanism in place to prevent the loop.
@auvipy, what do you think? can you close the bug if you agree with me?
Thanks in advance,
Frank
Hi,
Thank you very much for the fantastic work you do with Celery.
I'm using Celery 4.4.7, and I just spent a lot of time understanding why my server froze when calling
my_task.delay()
. It turned out that I had a name resolution error, and Celery (or more exactly Kombu) was retying forever to connect to the broker. I first tried to fix the infinite loop with broker_connection_max_retries but later realized that broker_transport_options was the culprit because the default formax_retries
isNone
. BTW, the Celery documentation should be more clear about the difference betweenbroker_connection_max_retries
andbroker_transport_options['max_retries']
.Retrying forever is a bad idea because it prevents the exception from bubbling up and makes errors very hard to diagnose. Please set a reasonable default value for
max_retries
to avoid this situation. From my experience, each connection retry takes several seconds, so a small value (let's say 10) is already a very long wait.Best regards, Benoit