Memory leak in v1.8.2 - Githubissues

eagletmt commented 4 years ago

When I upgraded td-agent in our production workload to v4.0.1, which bundles fluent-plugin-prometheus v1.8.2, the memory usage started to grow. To investigate further, I divided our aggregator nodes into two groups: w/ async-http and w/o async-http. In aggregator nodes w/o async-http, I ran sudo /opt/td-agent/bin/fluent-gem uninstall async async-http async-io async-pool to disable async implementation in fluent-plugin-prometheus. Aggregator nodes w/ async-http use the default td-agent v4.0.1 package. Our Prometheus instances scrape fluentd metrics from /aggregated_metrics endpoint with 15 seconds interval.

The result looks like below. Aggregator nodes w/ async-http show increasing memory usage while aggregator nodes w/o async-http show steady one.

eagletmt commented 4 years ago

Good news: I did another experiment to see if #171 also fixes the memory leak or not. In our environment, #171 solved the memory leak successfully. I manually applied the patch to one of "w/ async-http" aggregator nodes and it showed steady memory usage.

ioquatix commented 4 years ago

Very nice graphs.

eagletmt commented 4 years ago

I've confirmed this issue was fixed in v1.8.3.

fluent / fluent-plugin-prometheus

Memory leak in v1.8.2 #170