IBM / sarama

Sarama is a Go library for Apache Kafka.
MIT License
11.44k stars 1.75k forks source link

Metrics library leaks memory #897

Closed gdutliuyun827 closed 6 years ago

gdutliuyun827 commented 7 years ago
Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly. Sarama Version:46dabe0f34a162d428640a0e9714eea9079e7ee2 Kafka Version: kafka_2.11-0.10.1.0.tgz Go Version:1.8

Configuration

What configuration values are you using for Sarama and Kafka?

Logs

When filing an issue please provide logs from Sarama and Kafka if at all possible. You can set sarama.Logger to a log.Logger to capture Sarama debug output.

Problem Description

the Problem is metrics lib. the function is below. When call Broker.Open cause new StandardMeter,but when Broker.Close does't destory it.

func NewMeter() Meter { if UseNilMetrics { return NilMeter{} } m := newStandardMeter() arbiter.Lock() defer arbiter.Unlock() arbiter.meters = append(arbiter.meters, m) if !arbiter.started { arbiter.started = true go arbiter.tick() } return m }

eapache commented 7 years ago

The metrics library does not indicate that we need to do anything to clean up unused meters, so I would expect the go garbage-collector to take care of them eventually, once the broker itself is GCed or re-opened. @slaunay perhaps you can shed some light?

slaunay commented 7 years ago

The meter abstraction relies on a single goroutine for computing rates instead of one per meter. This means that each meter is referenced by a global meterArbitrer struct and never gets garbage collected indeed 😢, see: https://github.com/rcrowley/go-metrics/blob/f770e6f5e91a8770cecee02d5d3f7c00b023b4df/meter.go#L230

This would lead to memory leaks:

It does not seem possible in go-metrics to unregister the meter and get that reference removed (even if we do not unregister the metrics in Sarama today)... Note that we do not use timer metrics but that would result in the same issue as the StandardTimer struct uses a meter under the hood...

Looks like this needs to be fixed in go-metrics first and we would also have to use Registry#Unregister(string) or something, see:

lday0321 commented 7 years ago

Before go-metircs solving this leak problem, i think disable metrics collection metrics.UseNilMetrics=truewill be an workaround

soar-zhengjian commented 7 years ago

Me too. When frequence Call Client.GetOffset and Client.Close, Cause Memory Leak. Because Client.GetOffset will call Broker.Open.

slaunay commented 7 years ago

@lday0321 You can also workaround that memory leak without requiring changes upstream. Disabling the metrics by default to re-enable them later on sounds confusing to me against the end user.

@soar-zhengjian I believe that would be the case if you were to "reopen" a Broker with a different broker id every time. Do you have some numbers about that amount of memory leaked in your application over time related to that use case?

lday0321 commented 7 years ago

@slaunay according to my test, after "reopen" Broker 778827 times, the pprof shows that the memory leak will be 939.05MB (almost 1k/time)

Entering interactive mode (type "help" for commands)
(pprof) top 20 sarama
939.05MB of 949.56MB total (98.89%)
Dropped 205 nodes (cum <= 4.75MB)
      flat  flat%   sum%        cum   cum%
  486.54MB 51.24% 51.24%   912.56MB 96.10%  github.com/lday0321/myProj/vendor/github.com/rcrowley/go-metrics.newStandardMeter
  150.51MB 15.85% 67.09%   150.51MB 15.85%  github.com/lday0321/myProj/vendor/github.com/rcrowley/go-metrics.NewEWMA1
  140.01MB 14.74% 81.83%   140.01MB 14.74%  github.com/lday0321/myProj/vendor/github.com/rcrowley/go-metrics.NewEWMA15
  135.51MB 14.27% 96.10%   135.51MB 14.27%  github.com/lday0321/myProj/vendor/github.com/rcrowley/go-metrics.NewEWMA5
   26.48MB  2.79% 98.89%   939.05MB 98.89%  github.com/lday0321/myProj/vendor/github.com/rcrowley/go-metrics.NewMeter
         0     0% 98.89%   939.05MB 98.89%  github.com/lday0321/myProj/vendor/github.com/Shopify/sarama.(*Broker).Open.func1
         0     0% 98.89%   612.02MB 64.45%  github.com/lday0321/myProj/vendor/github.com/Shopify/sarama.getOrRegisterBrokerMeter
         0     0% 98.89%   939.05MB 98.89%  github.com/lday0321/myProj/vendor/github.com/Shopify/sarama.withRecover
         0     0% 98.89%   939.05MB 98.89%  github.com/lday0321/myProj/vendor/github.com/rcrowley/go-metrics.(*StandardRegistry).GetOrRegister
         0     0% 98.89%   939.05MB 98.89%  github.com/lday0321/myProj/vendor/github.com/rcrowley/go-metrics.GetOrRegisterMeter
         0     0% 98.89%   939.05MB 98.89%  reflect.Value.Call
         0     0% 98.89%   939.05MB 98.89%  reflect.Value.call
         0     0% 98.89%   939.05MB 98.89%  runtime.call32
         0     0% 98.89%   939.05MB 98.89%  runtime.goexit
slaunay commented 7 years ago

Thanks for providing those statistics @lday0321. I believe we use 4 meters shared among all brokers and 4 meters specific to each broker, this means the leak per meter would be around ~100B (828/8 B) for a single broker cluster, probably less for multiple brokers cluster.

Are you allocating a new Client that references a new Config and transitively a new metrics.Registry for each iteration in your use/test case?

lday0321 commented 7 years ago

Yes, i use a new client each time. Actually, i use the sarama-cluster's NewConsumer method

slaunay commented 7 years ago

You should be able to mostly avoid the memory leak by reusing either your own cluster.Client struct or cluster.Config struct as it embeds the sarama.Config struct that will reference a single metrics.Registry rather than creating new one each time.

Using a single Client could also speed up your application (cached metadata, already established connection to brokers, ...) but can also bring side effects depending on your use case.

Hopefully the memory leak can be fixed in the metrics library soon as the maintainer recently shown interest in rcrowley/go-metrics#197 😀.

lday0321 commented 7 years ago

@slaunay thanks for your suggestion! Is it still a good practice of using single Client if i need to setup thousands of consumers for thousands of topics? or any suggestions for this kind of case?

slaunay commented 7 years ago

@lday0321 I think reusing cluster.Client and cluster.Config as much as possible makes sense but if you have thousands of consumers and topics you probably do not want to do that in a single Go application.

That being if you have that much topics in a single cluster then Apache Kafka might not be the best technology, an AMQP broker like RabbitMQ could be a better fit depending on your use case.

mihasya commented 6 years ago

(the PR for being able to stop unused meters/counters has been merged https://github.com/rcrowley/go-metrics/pull/206)

eapache commented 6 years ago

Fixed (or enough, anyway) by #991.