go-graphite / go-carbon

Golang implementation of Graphite/Carbon server with classic architecture: Agent -> Cache -> Persister
MIT License
805 stars 123 forks source link

Increase worker for go-carbon #421

Open rickyari opened 3 years ago

rickyari commented 3 years ago

We have a cluster of 9 nodes and a cache limit of 5Mil for every node. Due to increased traffic , the nodes are touching the upper limit of the cache consistently and sometimes stop taking incoming metrics due to overflow of metrics. Can we increase the workers (persisters) so that the metrics are written faster to disk freeing up cache. If yes , in what multiples the workers should be increased for each node (currently 8 workers for each node).

we run our cluster on i3.2xlarge ec2 instances.

deniszh commented 3 years ago

Hi @rickyari

You can try to increase workers, but their scalability can be quite limited. You can post your config here (with sensitive part like hostnames stripped), then I can recommend some tunings.

But in general in similar situation you need to increase cache size (i.e. RAM consumption) and/or namber of nodes.

rickyari commented 3 years ago

Thanks for the reply @deniszh . Here is go-carbon.conf from one of the nodes.

user = "root"
graph-prefix = "go-carbon.agents.{host}"

# controlls GOMAXPROCS which itself controlls maximum number
# of actively executing threads, those which are blocked in systcalls
# are NOT part of this limit
max-cpu = 8
metric-interval = "1m0s"

[whisper]
data-dir = "/mnt/array1/graphite/whisper"
schemas-file = "/etc/go-carbon/whisper-schemas.conf"
aggregation-file = ""
workers = 8
max-updates-per-second = 0
sparse-create = true
enabled = true

[cache]
max-size = 5000000
write-strategy = "noop"

[pickle]
enabled = false

[tcp]
listen = ":2003"
enabled = true

[udp]
enabled = false

[carbonserver]
listen = ":8080"
enabled = true
buckets = 10
metrics-as-counters = false
read-timeout = "60s"
write-timeout = "60s"
query-cache-enabled = true
query-cache-size-mb = 0
find-cache-enabled = true
trigram-index = false
scan-frequency = "5m0s"
max-globs = 100
graphite-web-10-strict-mode = true
internal-stats-dir = ""

[carbonlink]
listen = "127.0.0.1:7002"
enabled = true
read-timeout = "30s"
query-timeout = "100ms"

[dump]
# Enable dump/restore function on USR2 signal
enabled = true
# Directory for store dump data. Should be writeable for carbon
path = "/mnt/array1"

[pprof]
listen = "localhost:7007"
enabled = false

# Default logger
[[logging]]
# logger name
# available loggers:
# * "" - default logger for all messages without configured special logger
# @TODO
logger = ""
# Log output: filename, "stderr", "stdout", "none", "" (same as "stderr")
file = "/var/log/go-carbon/go-carbon.log"
# Log level: "debug", "info", "warn", "error", "dpanic", "panic", and "fatal"
level = "error"
# Log format: "json", "console", "mixed"
encoding = "mixed"
# Log time format: "millis", "nanos", "epoch", "iso8601"
encoding-time = "iso8601"
# Log duration format: "seconds", "nanos", "string"
encoding-duration = "seconds"
deniszh commented 3 years ago

max-updates-per-second is not limited, so, you can try to increase max-cpu and workers up to number of CPU of your node. Also 5M for cache is not that many, you can do 10M or 20M or even more. Otherwise, you will need to add additional workers.

rickyari commented 3 years ago

i3.2xlarge ec2 instances have only 8 vcpu. so I guess I will not be able to increase the worker count. Lastly do we need to increase the cache limit in some multiples of 5 . or can we increase by any number like 1 Mil etc.

deniszh commented 3 years ago

You can increase it to any number.

rickyari commented 3 years ago

Thanks for your help. @deniszh