SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
18.41k stars 1.17k forks source link

High CPU Usage on Clickhouse container causing slowness on the server. #5546

Open Prakhil-tp opened 1 month ago

Prakhil-tp commented 1 month ago

In what situation are you experiencing subpar performance?

Our server’s CPU usage suddenly reached its maximum capacity (after running smoothly for 3 months). This affected the entire server and it became extremely slow. Upon investigation, we saw that the Clickhouse container was consuming a significant amount of CPU resources. We stopped in and finally the server recovered. We started it again and it has been running fine so far.

How to reproduce

unknown

Your Environment

•⁠ ⁠[x] Linux •⁠ ⁠[ ] Mac •⁠ ⁠[ ] Windows

OS: Ubuntu 20.04.4 LTS Kernal version: 5.4.0-147-generic

Self hosted Signoz

Additional context

We did some online investigations to see what caused this so we can find out how to prevent this from happening again in the future. We looked into this log file /var/log/clickhouse-server/clickhouse-server.log and found these errors

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c4fd017 in /usr/bin/clickhouse
1. DB::NetException::NetException<unsigned long&>(int, FormatStringHelperImpl<std::type_identity<unsigned long&>::type>, unsigned long&) @ 0x0000000012241cb8 in /usr/bin/clickhouse
2. DB::TCPHandler::runImpl() @ 0x000000001222f38c in /usr/bin/clickhouse
3. DB::TCPHandler::run() @ 0x0000000012246a79 in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x0000000014c6fc52 in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x0000000014c70a51 in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x0000000014d678e7 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000014d65edc in /usr/bin/clickhouse
8. ? @ 0x00007ff5bafd5609 in ?
9. ? @ 0x00007ff5baefa133 in ?
 (version 23.11.1.2711 (official build))
2024.07.24 07:36:49.532237 [ 1057 ] {} <Information> TCPHandler: Client has not sent any data.
2024.07.24 07:36:49.535023 [ 47 ] {} <Error> ServerErrorHandler: Code: 101. DB::NetException: Unexpected packet from client (expected Hello, got 22). (UNEXPECTED_PACKET_FROM_CLIENT), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c4fd017 in /usr/bin/clickhouse
1. DB::NetException::NetException<unsigned long&>(int, FormatStringHelperImpl<std::type_identity<unsigned long&>::type>, unsigned long&) @ 0x0000000012241cb8 in /usr/bin/clickhouse
2. DB::TCPHandler::runImpl() @ 0x000000001222f38c in /usr/bin/clickhouse
3. DB::TCPHandler::run() @ 0x0000000012246a79 in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x0000000014c6fc52 in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x0000000014c70a51 in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x0000000014d678e7 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000014d65edc in /usr/bin/clickhouse
8. ? @ 0x00007ff5bafd5609 in ?
9. ? @ 0x00007ff5baefa133 in ?
 (version 23.11.1.2711 (official build))
2024.07.24 07:36:57.124683 [ 47 ] {} <Error> ServerErrorHandler: Code: 101. DB::NetException: Unexpected packet from client (expected Hello, got 22). (UNEXPECTED_PACKET_FROM_CLIENT), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c4fd017 in /usr/bin/clickhouse
1. DB::NetException::NetException<unsigned long&>(int, FormatStringHelperImpl<std::type_identity<unsigned long&>::type>, unsigned long&) @ 0x0000000012241cb8 in /usr/bin/clickhouse
2. DB::TCPHandler::runImpl() @ 0x000000001222f38c in /usr/bin/clickhouse
3. DB::TCPHandler::run() @ 0x0000000012246a79 in /usr/bin/clickhouse
4. Poco::Net::TCPServerConnection::start() @ 0x0000000014c6fc52 in /usr/bin/clickhouse
5. Poco::Net::TCPServerDispatcher::run() @ 0x0000000014c70a51 in /usr/bin/clickhouse
6. Poco::PooledThread::run() @ 0x0000000014d678e7 in /usr/bin/clickhouse
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000014d65edc in /usr/bin/clickhouse
8. ? @ 0x00007ff5bafd5609 in ?
9. ? @ 0x00007ff5baefa133 in ?
 (version 23.11.1.2711 (official build))

Although we found online that these errors are harmless. So we have no idea what could have caused this. Do you have any insight on how to troubleshoot this issue?

If this is not the right place to post this, may you please advise where we should write this issue?

welcome[bot] commented 1 month ago

Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.