Open stevesg opened 3 years ago
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
Still valid
@stevesg I wants to work on this issue could you please tell how I can reproduce this issue and which file are associated to fix this issue ?
@Alphasaurs thanks for volunteering!
If you take the latest code and cherry-pick Steve's commit here: https://github.com/stevesg/cortex/commit/b0a95a22cf06f441aa4e4f830df63d14defe83db
then go test pkg/distributor
should show part of the problem.
Let us know how you get on.
@bboreham I am new to goroutine leak and I found the leaks by the instruction provided by you but still I don't know how can I solve them I am sharing docs file could you please comment on it for the few codes so what has to be done to solve this leaks and how are they are leaking it will be helpful https://docs.google.com/document/d/17nID6r_71yyleHSVk6tw3pP55aqIYDDUQKrEOnj0A1M/edit?usp=sharing
Broadly the pattern is that you have to find what is starting the goroutines, find out how they are supposed to be stopped, and add or fix the code in the tests that will stop them.
You should really focus on the cases like Steve noted where hundreds of goroutines are left running; your Google Doc seems to list single ones which we can get to later.
However I can use one example from your doc:
for {
select {
case <-ticker.C:
err := om.loadConfig()
if err != nil {
// Log but don't stop on error - we don't want to halt all ingesters because of a typo
level.Error(util_log.Logger).Log("msg", "failed to load config", "err", err)
}
case <-ctx.Done():
return nil
}
}
The for
will loop forever, which is expected to be stopped by reading from ctx.Done()
. This means some other code should call cancel
on that context at the end of the test.
Often we do this in "cleanup" routines added to a test, like this: https://github.com/cortexproject/cortex/blob/34bb142b129bd781539001fb1629cf2f3ed54b82/pkg/storage/tsdb/bucketindex/loader_test.go#L42-L44
I looked into the biggest number, in distributor_tests.go
, and proposed #4506.
Hi all,
based on @stevesg 's suggestion I've created a pull request (#4577) to integrate goleak.VerifyTestMain() into every test. Bu I made this optional. This let us test for unit tests only if desired and it won't break any other "normal" unit tests.
First I tried to make the optional run of goleak.VerifyTestMain() selectable by a command line switch or a flag. This was all a bit confusing and may break some future extensions. So I decided to make is selectable by an environment variable called CORTEX_TEST_GOLEAK. If that variable has been set to 1 goleak.VerifyTestMain() will run while testing. If it hasn't been set it won't run and all will work as before.
Regards, Oliver
Spent the morning debugging what I thought was a goroutine leak, but turned out to be just a huge number of goroutines being created in a short period of time. However, it seems a lot of the unit tests do leak goroutines. Most of these might just be issues with tests not cleaning up properly, but it makes testing/debugging real leaks quite difficult. Or in my case, the leaks were a red herring - the issue was something else.
Given we already have goleak vendored in presumably from Prometheus (they use it), it would be nice to start using it.
We leak ~2000 goroutines per full run of the unit tests. Summary below, full data here: unit-test-leaks.txt
(Changes used are here - https://github.com/stevesg/cortex/commit/b0a95a22cf06f441aa4e4f830df63d14defe83db )