celery / kombu

Messaging library for Python.
http://kombu.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
2.88k stars 928 forks source link

Set a reasonable default value for max_retries to avoid infinite loops #1298

Open bblanchon opened 3 years ago

bblanchon commented 3 years ago

Hi,

Thank you very much for the fantastic work you do with Celery.

I'm using Celery 4.4.7, and I just spent a lot of time understanding why my server froze when calling my_task.delay(). It turned out that I had a name resolution error, and Celery (or more exactly Kombu) was retying forever to connect to the broker. I first tried to fix the infinite loop with broker_connection_max_retries but later realized that broker_transport_options was the culprit because the default for max_retries is None. BTW, the Celery documentation should be more clear about the difference between broker_connection_max_retries and broker_transport_options['max_retries'].

Retrying forever is a bad idea because it prevents the exception from bubbling up and makes errors very hard to diagnose. Please set a reasonable default value for max_retries to avoid this situation. From my experience, each connection retry takes several seconds, so a small value (let's say 10) is already a very long wait.

Best regards, Benoit

auvipy commented 3 years ago

did you also check the master branch? how many retries did you found not causing an infinite loop?

bblanchon commented 3 years ago

did you also check the master branch?

I didn't check out the master branch to reproduce the issue, but looking at the code suggests that the default value is still None. Moreover, the changelog doesn't mention any change in that regard.

how many retries did you found not causing an infinite loop?

I'm sorry; I don't understand this question.

thedrow commented 3 years ago

I think what Asif means is: What do you consider a reasonable default?

bblanchon commented 3 years ago

As I implied in my original message, I think 10 is a reasonable default, but it could be lower. For example, I use the value 3 in my project, which means a total of 4 connection attempts.

auvipy commented 3 years ago

As I implied in my original message, I think 10 is a reasonable default, but it could be lower. For example, I use the value 3 in my project, which means a total of 4 connection attempts.

are you up for updating docs, & set a reasonable max_retry = 5 & do the related code changes?

bblanchon commented 3 years ago

Thank you very much for the offer, but I don't think I have the knowledge required to write this patch.

FrankK-1234 commented 4 months ago

Hi @auvipy,

I had prepared some code this PR to set the default max_retries to 5. However it resulted in difficult to explain behaviour in the documentation. To me it seems the transport_option/max_retries applies to both the connect and for the transport layer. The default value I made only applied to the connection (set in _ensure_connection). The other thing which worried me is the implicit change which could cause in case an implementation relies on the old default of indefinite.

While investigation I found #1458 which makes sure the timeout is set to the already available connection_timeout (see change) or the transport_option connect_retries_timeout (added by #1599). The connection_timeout already defaulted to 5 sec. #1458 is released in 5.2.3 and prevents an infinite loop (though on different parameter) by default.

I see little use of still implementing this, mainly because it changes the default behaviour and complicates the explanation. Since #1458 there is already another default mechanism in place to prevent the loop.

@auvipy, what do you think? can you close the bug if you agree with me?

Thanks in advance,

Frank