Rungutan / sentry-fargate-cf-stack

AWS CloudFormation template to launch a highly-available Sentry 20 stack through ECS Fargate at the minimum cost possible
Apache License 2.0
60 stars 16 forks source link

ServiceSnubaOutcomesConsumer is not restarted after shut down #54

Closed nodomain closed 3 years ago

nodomain commented 3 years ago

I noticed that the events stopped processing this morning. Looking at the logs I found

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   timestamp   |                                                                          message                                                                           |
|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1620013136078 | 2021-05-03 03:38:56,078 Completed processing <Batch: 1 message, open for 1.01 seconds>.                                                                    |
| 1620013138444 | 2021-05-03 03:38:58,444 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013138447 | 2021-05-03 03:38:58,446 Completed processing <Batch: 3 messages, open for 1.03 seconds>.                                                                   |
| 1620013139729 | 2021-05-03 03:38:59,729 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013139729 | 2021-05-03 03:38:59,729 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013139732 | 2021-05-03 03:38:59,732 Completed processing <Batch: 3 messages, open for 1.02 seconds>.                                                                   |
| 1620013141643 | 2021-05-03 03:39:01,643 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013141645 | 2021-05-03 03:39:01,645 Completed processing <Batch: 2 messages, open for 1.01 seconds>.                                                                   |
| 1620013147922 | 2021-05-03 03:39:07,922 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013147922 | 2021-05-03 03:39:07,922 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013147924 | 2021-05-03 03:39:07,924 Completed processing <Batch: 4 messages, open for 1.02 seconds>.                                                                   |
| 1620013149565 | 2021-05-03 03:39:09,565 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013149566 | 2021-05-03 03:39:09,565 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013149568 | 2021-05-03 03:39:09,567 Completed processing <Batch: 4 messages, open for 1.03 seconds>.                                                                   |
| 1620013156473 | 2021-05-03 03:39:16,473 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013156476 | 2021-05-03 03:39:16,475 Completed processing <Batch: 2 messages, open for 1.03 seconds>.                                                                   |
| 1620013157704 | 2021-05-03 03:39:17,703 Error submitting packet, dropping the packet and closing the socket                                                                |
| 1620013157706 | 2021-05-03 03:39:17,706 Completed processing <Batch: 2 messages, open for 1.02 seconds>.                                                                   |
| 1620013160706 | 2021-05-03 03:39:20,706 Caught ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')), shutting down... |
| 1620013160717 | Traceback (most recent call last):                                                                                                                         |
| 1620013160717 |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 600, in urlopen                                                            |
| 1620013160718 |     httplib_response = self._make_request(conn, method, url,                                                                                               |
| 1620013160718 |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 387, in _make_request                                                      |
| 1620013160718 |     six.raise_from(e, None)                                                                                                                                |
| 1620013160718 |   File "<string>", line 2, in raise_from                                                                                                                   |
| 1620013160718 |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 383, in _make_request                                                      |
| 1620013160718 |     httplib_response = conn.getresponse()                                                                                                                  |
| 1620013160718 |   File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 102, in getresponse                                                |
| 1620013160718 |     rv = real_getresponse(self, *args, **kwargs)                                                                                                           |
| 1620013160718 |   File "/usr/local/lib/python3.8/http/client.py", line 1344, in getresponse                                                                                |
| 1620013160718 |     response.begin()                                                                                                                                       |
| 1620013160718 |   File "/usr/local/lib/python3.8/http/client.py", line 307, in begin                                                                                       |
| 1620013160719 |     version, status, reason = self._read_status()                                                                                                          |
| 1620013160719 |   File "/usr/local/lib/python3.8/http/client.py", line 276, in _read_status                                                                                |
| 1620013160719 |     raise RemoteDisconnected("Remote end closed connection without"                                                                                        |
| 1620013160719 | http.client.RemoteDisconnected: Remote end closed connection without response                                                                              |
| 1620013160719 | During handling of the above exception, another exception occurred:                                                                                        |
| 1620013160719 | Traceback (most recent call last):                                                                                                                         |
| 1620013160719 |   File "/usr/local/bin/snuba", line 33, in <module>                                                                                                        |
| 1620013160719 |     sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())                                                                                      |
| 1620013160719 |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__                                                                       |
| 1620013160719 |     return self.main(*args, **kwargs)                                                                                                                      |
| 1620013160719 |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main                                                                           |
| 1620013160719 |     rv = self.invoke(ctx)                                                                                                                                  |
| 1620013160719 |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke                                                                        |
| 1620013160720 |     return _process_result(sub_ctx.command.invoke(sub_ctx))                                                                                                |
| 1620013160720 |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke                                                                        |
| 1620013160720 |     return ctx.invoke(self.callback, **ctx.params)                                                                                                         |
| 1620013160720 |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke                                                                         |
| 1620013160720 |     return callback(*args, **kwargs)                                                                                                                       |
| 1620013160720 |   File "/usr/src/snuba/snuba/cli/consumer.py", line 161, in consumer                                                                                       |
| 1620013160720 |     consumer.run()                                                                                                                                         |
| 1620013160720 |   File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 112, in run                                                                      |
| 1620013160720 |     self._run_once()                                                                                                                                       |
| 1620013160720 |   File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 147, in _run_once                                                                |
| 1620013160720 |     self.__processing_strategy.poll()                                                                                                                      |
| 1620013160720 |   File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/transform.py", line 61, in poll                                                 |
| 1620013160721 |     self.__next_step.poll()                                                                                                                                |
| 1620013160721 |   File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/collect.py", line 122, in poll                                                  |
| 1620013160721 |     self.__close_and_reset_batch()                                                                                                                         |
| 1620013160721 |   File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/collect.py", line 104, in __close_and_reset_batch                               |
| 1620013160721 |     self.__batch.close()                                                                                                                                   |
| 1620013160721 |   File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/collect.py", line 64, in close                                                  |
| 1620013160721 |     self.__step.close()                                                                                                                                    |
| 1620013160721 |   File "/usr/src/snuba/snuba/consumers/consumer.py", line 225, in close                                                                                    |
| 1620013160721 |     self.__insert_batch_writer.close()                                                                                                                     |
| 1620013160721 |   File "/usr/src/snuba/snuba/consumers/consumer.py", line 90, in close                                                                                     |
| 1620013160721 |     self.__writer.write(                                                                                                                                   |
| 1620013160721 |   File "/usr/src/snuba/snuba/clickhouse/http.py", line 245, in write                                                                                       |
| 1620013160721 |     batch.join()                                                                                                                                           |
| 1620013160721 |   File "/usr/src/snuba/snuba/clickhouse/http.py", line 181, in join                                                                                        |
| 1620013160721 |     response = self.__result.result(timeout)                                                                                                               |
| 1620013160721 |   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result                                                                         |
| 1620013160721 |     return self.__get_result()                                                                                                                             |
| 1620013160721 |   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result                                                                   |
| 1620013160722 |     raise self._exception                                                                                                                                  |
| 1620013160722 |   File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run                                                                            |
| 1620013160722 |     result = self.fn(*self.args, **self.kwargs)                                                                                                            |
| 1620013160722 |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 640, in urlopen                                                            |
| 1620013160722 |     retries = retries.increment(method, url, error=e, _pool=self,                                                                                          |
| 1620013160722 |   File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 368, in increment                                                              |
| 1620013160722 |     raise six.reraise(type(error), error, _stacktrace)                                                                                                     |
| 1620013160722 |   File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 685, in reraise                                                              |
| 1620013160722 |     raise value.with_traceback(tb)                                                                                                                         |
| 1620013160722 |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 600, in urlopen                                                            |
| 1620013160723 |     httplib_response = self._make_request(conn, method, url,                                                                                               |
| 1620013160723 |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 387, in _make_request                                                      |
| 1620013160723 |     six.raise_from(e, None)                                                                                                                                |
| 1620013160723 |   File "<string>", line 2, in raise_from                                                                                                                   |
| 1620013160723 |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 383, in _make_request                                                      |
| 1620013160723 |     httplib_response = conn.getresponse()                                                                                                                  |
| 1620013160723 |   File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 102, in getresponse                                                |
| 1620013160723 |     rv = real_getresponse(self, *args, **kwargs)                                                                                                           |
| 1620013160723 |   File "/usr/local/lib/python3.8/http/client.py", line 1344, in getresponse                                                                                |
| 1620013160723 |     response.begin()                                                                                                                                       |
| 1620013160723 |   File "/usr/local/lib/python3.8/http/client.py", line 307, in begin                                                                                       |
| 1620013160723 |     version, status, reason = self._read_status()                                                                                                          |
| 1620013160723 |   File "/usr/local/lib/python3.8/http/client.py", line 276, in _read_status                                                                                |
| 1620013160724 |     raise RemoteDisconnected("Remote end closed connection without"                                                                                        |
| 1620013160724 | urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))                             |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Nevertheless, the status of the task in ECS was still "running" and hence no restarted followed.

I'll check next time this happens by connecting to the respective ECS container and try to look if this can somehow be mitigated, e.g. if the process in the container is really shut down or not.

Feel free to assign this issue to me.

mariusmitrofan commented 3 years ago

Check the "monitoring" tab in OpsWorks for ClickHouse.

Looks like the db got overloaded to me...

nodomain commented 3 years ago
image

WDYT?

mariusmitrofan commented 3 years ago

Hm https://stackoverflow.com/questions/60899666/clickhouse-client-get-error-timeout-exceeded-while-reading-from-socket

Maybe just a restart of the ClickHouse server?

Aka smth like "service clickhouse-server restart" should be enough

nodomain commented 3 years ago

Will check later! Thx

nodomain commented 3 years ago

This did not happen again. Closing it.