celery / kombu

Messaging library for Python.
http://kombu.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
2.81k stars 920 forks source link

Kombu transport SQS channel using expired sts token to connect to SQS #2031

Open chenxg283 opened 1 week ago

chenxg283 commented 1 week ago

Having the following exception from time to time, not easy to reproduce though.

CRITICAL/MainProcess] Unrecoverable error: Exception('Request HTTP Error HTTP 403 Forbidden (b\'{"type":"com.amazon.coral.service#ExpiredTokenException","message":"The security token included in the request is expired"}\')') Traceback (most recent call last): File "/opt/app-root/lib64/python3.9/site-packages/celery/worker/worker.py", line 202, in start self.blueprint.start(self) File "/opt/app-root/lib64/python3.9/site-packages/celery/bootsteps.py", line 116, in start step.start(parent) File "/opt/app-root/lib64/python3.9/site-packages/celery/bootsteps.py", line 365, in start return self.obj.start() File "/opt/app-root/lib64/python3.9/site-packages/celery/worker/consumer/consumer.py", line 340, in start blueprint.start(self) File "/opt/app-root/lib64/python3.9/site-packages/celery/bootsteps.py", line 116, in start step.start(parent) File "/opt/app-root/lib64/python3.9/site-packages/celery/worker/consumer/consumer.py", line 746, in start c.loop(c.loop_args()) File "/opt/app-root/lib64/python3.9/site-packages/celery/worker/loops.py", line 97, in asynloop next(loop) File "/opt/app-root/lib64/python3.9/site-packages/kombu/asynchronous/hub.py", line 373, in create_loop cb(cbargs) File "/opt/app-root/lib64/python3.9/site-packages/kombu/asynchronous/http/curl.py", line 122, in on_readable return self._on_event(fd, _pycurl.CSELECT_IN) File "/opt/app-root/lib64/python3.9/site-packages/kombu/asynchronous/http/curl.py", line 139, in _on_event self._process_pending_requests() File "/opt/app-root/lib64/python3.9/site-packages/kombu/asynchronous/http/curl.py", line 145, in _process_pending_requests self._process(curl) File "/opt/app-root/lib64/python3.9/site-packages/kombu/asynchronous/http/curl.py", line 191, in _process request.on_ready(self.Response( File "/opt/app-root/lib64/python3.9/site-packages/vine/promises.py", line 168, in call svpending(*ca, **ck) File "/opt/app-root/lib64/python3.9/site-packages/vine/promises.py", line 161, in call return self.throw() File "/opt/app-root/lib64/python3.9/site-packages/vine/promises.py", line 158, in call__ retval = fun(*final_args, final_kwargs) File "/opt/app-root/lib64/python3.9/site-packages/vine/funtools.py", line 98, in _transback return callback(ret) File "/opt/app-root/lib64/python3.9/site-packages/vine/promises.py", line 161, in call return self.throw() File "/opt/app-root/lib64/python3.9/site-packages/vine/promises.py", line 158, in call retval = fun(*final_args, *final_kwargs) File "/opt/app-root/lib64/python3.9/site-packages/vine/funtools.py", line 96, in _transback callback.throw() File "/opt/app-root/lib64/python3.9/site-packages/vine/funtools.py", line 94, in transback ret = filter(args + (ret,), kwargs) File "/opt/app-root/lib64/python3.9/site-packages/kombu/asynchronous/aws/connection.py", line 246, in _on_list_ready raise self._for_status(response, response.read()) Exception: Request HTTP Error HTTP 403 Forbidden (b'{"__type":"com.amazon.coral.service#ExpiredTokenException","message":"The security token included in the request is expired"}')

However, we observed this usually happens when celery tries to connect to SQS just before the sts token expiry.
E.g. 1 second before token expiry, in function kombu/transport/SQS.py/Channel/_handle_sts_session(), it think the sts token is still valid, hence no need to refresh, but until the time when celery worker using that token to connect to SQS, the token could be already expired.

Think the logic in _handle_sts_session need to be enhanced to refresh the sts token some time before the expiry, rather than after the expiry.