cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.62k stars 3.01k forks source link

Clickhouse large number of threads #8352

Closed gorghino closed 2 months ago

gorghino commented 2 months ago

Actions before raising this issue

Steps to Reproduce

  1. Run a CVAT server on a workstation
  2. Launch the containers
  3. Check the number of threads used by clickhouse-server

Expected Behavior

A reasonable number of used threads (?)

Possible Solution

No response

Context

I'm not sure if this's intentional or not, but I recently found that my clickhouse-server process is using 708 threads. The CPU/Mem usage seems ok though.

image

Environment

- commit f93d58c1ca9401daeee5beba5d5f79ace975c02b (HEAD -> develop, origin/develop, origin/HEAD)
Author: Roman Donchenko <roman@cvat.ai>
Date:   Fri Aug 23 17:38:24 2024 +0300

- Docker version: Version:           27.1.2

- No docker Swarm or Kubernetes

- OS: Ubuntu 22.04.4 LTS x86_64 
Kernel: 6.8.0-40-generic 
Shell: bash 5.1.16 
CPU: 13th Gen Intel i9-13900K (32) @ 5.500GHz 
Memory: 4724MiB / 64005MiB 

drone-racing@DRWS:~/cvat$ docker logs cvat_clickhouse
/entrypoint.sh: create new user 'user' instead 'default'
ClickHouse Database directory appears to contain a database; Skipping initialization
Processing configuration file '/etc/clickhouse-server/config.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Processing configuration file '/etc/clickhouse-server/config.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
Processing configuration file '/etc/clickhouse-server/users.xml'.
Merging configuration file '/etc/clickhouse-server/users.d/default-user.xml'.
Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.
/entrypoint.sh: create new user 'user' instead 'default'
ClickHouse Database directory appears to contain a database; Skipping initialization
Processing configuration file '/etc/clickhouse-server/config.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Processing configuration file '/etc/clickhouse-server/config.xml'.
Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
Processing configuration file '/etc/clickhouse-server/users.xml'.
Merging configuration file '/etc/clickhouse-server/users.d/default-user.xml'.
Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.
bsekachev commented 2 months ago

Ask clickhouse community?

We do not do anything special from our side. Just push and read records.

gorghino commented 2 months ago

Hi @bsekachev tnks for the answer. Are those numbers similar to your machine as well? Also I'm not very familiar with it and I don't know if this is may be related to some cvat env setting.

bsekachev commented 2 months ago

image

I do not see any problems with number of threads. It does not affect overall performance.

gorghino commented 2 months ago

For anyone with the same doubt: it's intentional. You can retrieve the threads by type (logging inside the container and installing procps apk --no-cache add procps ) with:

ps H -o 'tid comm' $(pidof -s clickhouse-server) | tail -n +2 | awk '{ printf("%s\t%s\n", $1, $2) }' | clickhouse-local -S "threadid UInt16, name String" -q "SELECT name, count() FROM table GROUP BY name WITH TOTALS ORDER BY count() DESC FORMAT PrettyCompact"

By default background_schedule_pool_size is 512. The others are in different groups. The clickhouse community also claimed the threads can go up to 3-5k under workload.