Saw some failing jobs yesterday, that caused a bunch of "SQS messages too old" alarms. Because the workers were dying, the messages were hidden for ~1 hour 20 times in a row.
I think these were a story that got deleted before the image callback and search indexer+deindexer jobs came in. I thought we were handling those GID failures everywhere, but maybe not. Look into it, because those alarms are noisy!
Saw some failing jobs yesterday, that caused a bunch of "SQS messages too old" alarms. Because the workers were dying, the messages were hidden for ~1 hour 20 times in a row.
I think these were a story that got deleted before the image callback and search indexer+deindexer jobs came in. I thought we were handling those GID failures everywhere, but maybe not. Look into it, because those alarms are noisy!
From the image-callback queue:
And 2 in the search indexer queue: