go-graphite / go-carbon

Golang implementation of Graphite/Carbon server with classic architecture: Agent -> Cache -> Persister
MIT License
801 stars 126 forks source link

[BUG] Memory usage steady growing over time #597

Open Thorsieger opened 4 days ago

Thorsieger commented 4 days ago

Describe the bug I am experiencing a slow but steady memory leak which forces a service restart every week or so.

Logs Memory usage over time on the physical server : image

pprof (on one instance) :

Showing nodes accounting for 1917.94MB, 99.01% of 1937.09MB total
Dropped 34 nodes (cum <= 9.69MB)
Showing top 10 nodes out of 45
      flat  flat%   sum%        cum   cum%
 1025.26MB 52.93% 52.93%  1025.26MB 52.93%  github.com/dgryski/go-trigram.NewIndex
  618.16MB 31.91% 84.84%   618.16MB 31.91%  strings.(*Builder).grow (inline)
  101.44MB  5.24% 90.08%   101.44MB  5.24%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
  101.05MB  5.22% 95.29%   101.05MB  5.22%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).updateFileList.func3
      22MB  1.14% 96.43%   138.94MB  7.17%  github.com/go-graphite/go-carbon/receiver/tcp.(*TCP).HandleConnection
   17.23MB  0.89% 97.32%    17.23MB  0.89%  github.com/go-graphite/go-carbon/cache.(*Cache).makeQueue
      15MB  0.77% 98.09%       15MB  0.77%  github.com/go-graphite/go-carbon/points.OnePoint (inline)
   14.16MB  0.73% 98.82%    14.16MB  0.73%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
       2MB   0.1% 98.93%    37.12MB  1.92%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).fetchWithCache.func1
    1.62MB 0.084% 99.01%    35.12MB  1.81%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).prepareDataProto
      flat  flat%   sum%        cum   cum%
 2481.68MB 21.47% 21.47%  2481.68MB 21.47%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
 1952.18MB 16.89% 38.36%  1952.18MB 16.89%  github.com/go-graphite/protocol/carbonapi_v3_pb.(*FetchRequest).UnmarshalVT
 1882.28MB 16.28% 54.64%  1882.28MB 16.28%  strings.(*Builder).grow
 1313.07MB 11.36% 66.00%  3921.19MB 33.92%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
 1205.57MB 10.43% 76.43%  1205.57MB 10.43%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
  931.52MB  8.06% 84.49%   931.52MB  8.06%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
  561.05MB  4.85% 89.35%   561.05MB  4.85%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
  471.03MB  4.08% 93.42%   471.03MB  4.08%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
  184.90MB  1.60% 95.02%   184.90MB  1.60%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
     161MB  1.39% 96.41%   767.56MB  6.64%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie

Go-carbon Configuration:

[common]
user = "carbon"
graph-prefix = "carbon.agents.{host}"
metric-endpoint = "tcp://10.254.0.36:2003"
metric-interval = "1m0s"
max-cpu = 6

[whisper]
data-dir = "/var/lib/graphite/whisper"
schemas-file = "/etc/go-carbon/storage-schemas.conf"
aggregation-file = "/etc/go-carbon/storage-aggregation.conf"
workers = 8
max-updates-per-second = 10000
max-creates-per-second = 500
hard-max-creates-per-second = false
sparse-create = false
flock = true
enabled = true
hash-filenames = true
compressed = false
remove-empty-file = false

[cache]
max-size = 50000000
write-strategy = "noop"

[udp]
listen = ":2003"
enabled = false
buffer-size = 0

[tcp]
listen = ":2003"
enabled = true
buffer-size = 0

[pickle]
listen = ":2004"
max-message-size = 67108864
enabled = false
buffer-size = 0

[carbonlink]
listen = "127.0.0.1:7002"
enabled = true
read-timeout = "30s"

[grpc]
listen = "127.0.0.1:7003"
enabled = false

[tags]
enabled = false
tagdb-url = "http://127.0.0.1:8000"
tagdb-chunk-size = 32
tagdb-update-interval = 100
local-dir = "/var/lib/graphite/tagging/"
tagdb-timeout = "1s"

[carbonserver]
listen = "0.0.0.0:8080"
enabled = true
buckets = 10
metrics-as-counters = false
read-timeout = "60s"
write-timeout = "60s"
query-cache-enabled = false
query-cache-size-mb = 40960
find-cache-enabled = true
trigram-index = false
scan-frequency = "5m0s"
trie-index = true
file-list-cache = ""
concurrent-index = false
realtime-index = 0
cache-scan = false
max-globs = 600
fail-on-max-globs = false
max-metrics-globbed  = 30000
max-metrics-rendered = 1000
empty-result-ok = false
internal-stats-dir = ""
stats-percentiles = [99, 98, 95, 75, 50]

[dump]
enabled = false
path = "/var/lib/graphite/dump/"
restore-per-second = 0

[pprof]
listen = "localhost:7007"
enabled = true

[[logging]]
logger = ""
file = "stdout"
level = "info"
encoding = "json"
encoding-time = "iso8601"
encoding-duration = "seconds"

Metric retention and aggregation schemas N/A

Simplified query (if applicable) N/A

Additional context I have a graphite infrastructure that handle 2.4M metrics/minutes. The storage part is composed of 4 go-carbon instances behind a carbon-c-relay. This 4 storages nodes are on a single physical server : 32 cpu/512GB ram/NVME storage.

go-carbon version : ghcr.io/go-graphite/go-carbon:0.17.3

After checking existing issues, I tried both trie and/or trigram for indexes with no effect. I enabled pprof, the output is above.

may be related to #579

Thorsieger commented 2 days ago

and today :

Showing nodes accounting for 66.14GB, 94.50% of 70GB total
Dropped 165 nodes (cum <= 0.35GB)
Showing top 10 nodes out of 52
      flat  flat%   sum%        cum   cum%
   20.40GB 29.14% 29.14%    20.40GB 29.14%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
   16.35GB 23.36% 52.50%    16.35GB 23.36%  github.com/go-graphite/protocol/carbonapi_v3_pb.(*FetchRequest).UnmarshalVT
   15.97GB 22.81% 75.31%    15.97GB 22.81%  strings.(*Builder).grow
    4.29GB  6.14% 81.45%     4.29GB  6.14%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
    2.87GB  4.10% 85.55%     2.87GB  4.10%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
    2.22GB  3.17% 88.72%     7.32GB 10.45%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
    1.44GB  2.06% 90.78%     1.44GB  2.06%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
    1.36GB  1.95% 92.73%     6.24GB  8.92%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie
    0.78GB  1.12% 93.85%     0.78GB  1.12%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
    0.45GB  0.65% 94.50%     4.88GB  6.97%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).query

If you need more information, please ask ;)

deniszh commented 7 hours ago

Hi @Thorsieger Yes, pprofs is quite convincing - looks like there's memory leak in getExpandedGlobs and polssibly in UnmarshalVT. Need to be investigated. Maybe less noticable for us because we're doing deploy every month, at least.

deniszh commented 7 hours ago

And looks like big "max-metrics-globbed" is main grow driver too.