cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.
https://cortexmetrics.io/
Apache License 2.0
5.47k stars 796 forks source link

when ingester starting, ingester increase thousands goroutines #4393

Open georgehao opened 3 years ago

georgehao commented 3 years ago

Describe the bug

Hi, When a ingester just starting a moment, the goroutines increase thousands. but wait just 1h,the ingester started, the gorouines decrease 100

there is two feature:

  1. when ingester run a period of time, the server pressure is not large.
  2. but when all the ingester just start at the same time, all the ingester can't starting. the goroutines of the ingester became large, finally oom, restarting.

when starting:

image

when started: image

image

To Reproduce

  1. cortex version: release-1.9.0
  2. no ingestion_rate

Expected behavior

when starting, why so many goroutines

Storage Engine

thanks

bboreham commented 3 years ago

Can you upload the actual profile? I find those graphviz diagrams unreadable. Optionally using https://share.polarsignals.com/.

georgehao commented 3 years ago
goroutine profile: total 63948
63873 @ 0x43a7c5 0x44b2e5 0x44b2ce 0x46c5a7 0x47b825 0x47cf90 0x47cf22 0x9fc995 0x9fc725 0x9f816a 0x1fdd276 0x1fd292d 0x16ede89 0x202eee3 0xb8dda3 0x201b256 0xb8dda3 0xb91b7a 0xb8dda3 0xccd3bd 0xb8dda3 0xcca2b8 0xb8dda3 0xb8df97 0x16d3890 0xb3e6a2 0xb42825 0xb50ee5 0x470621
#   0x46c5a6    sync.runtime_SemacquireMutex+0x46                           /data1/export/go1.15.7/src/runtime/sema.go:71
#   0x47b824    sync.(*Mutex).lockSlow+0x104                                /data1/export/go1.15.7/src/sync/mutex.go:138
#   0x47cf8f    sync.(*Mutex).Lock+0x8f                                 /data1/export/go1.15.7/src/sync/mutex.go:81
#   0x47cf21    sync.(*RWMutex).Lock+0x21                               /data1/export/go1.15.7/src/sync/rwmutex.go:98
#   0x9fc994    github.com/prometheus/prometheus/tsdb.(*Head).getOrCreateWithID+0x214           /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/prometheus/prometheus/tsdb/head.go:1800
#   0x9fc724    github.com/prometheus/prometheus/tsdb.(*Head).getOrCreate+0xe4              /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/prometheus/prometheus/tsdb/head.go:1783
#   0x9f8169    github.com/prometheus/prometheus/tsdb.(*headAppender).Add+0x129             /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/prometheus/prometheus/tsdb/head.go:1193
#   0x1fdd275   github.com/cortexproject/cortex/pkg/ingester.(*Ingester).v2Push+0x15d5          /home/xx/gopath/src/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:766
#   0x1fd292c   github.com/cortexproject/cortex/pkg/ingester.(*Ingester).Push+0x9ec         /home/xx/gopath/src/github.com/cortexproject/cortex/pkg/ingester/ingester.go:445
#   0x16ede88   github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler.func1+0x88   /home/xx/gopath/src/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2565
#   0x202eee2   github.com/cortexproject/cortex/pkg/cortex.ThanosTracerUnaryInterceptor+0xa2        /home/xx/gopath/src/github.com/cortexproject/cortex/pkg/cortex/tracing.go:14
#   0xb8dda2    github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62        /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#   0x201b255   github.com/cortexproject/cortex/pkg/util/fakeauth.glob..func2+0xb5          /home/xx/gopath/src/github.com/cortexproject/cortex/pkg/util/fakeauth/fake_auth.go:60
#   0xb8dda2    github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62        /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#   0xb91b79    github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x2d9     /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/opentracing-contrib/go-grpc/server.go:57
#   0xb8dda2    github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62        /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#   0xccd3bc    github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1+0x9c /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:32
#   0xb8dda2    github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62        /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#   0xcca2b7    github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor+0x97   /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:29
#   0xb8dda2    github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62        /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#   0xb8df96    github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xd6        /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#   0x16d388f   github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler+0x14f    /home/xx/gopath/src/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2567
#   0xb3e6a1    google.golang.org/grpc.(*Server).processUnaryRPC+0x521                  /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/google.golang.org/grpc/server.go:1210
#   0xb42824    google.golang.org/grpc.(*Server).handleStream+0xd04                 /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/google.golang.org/grpc/server.go:1533
#   0xb50ee4    google.golang.org/grpc.(*Server).serveStreams.func1.2+0xa4              /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/google.golang.org/grpc/server.go:871
bboreham commented 3 years ago

The large number looks the same as #4324. We have a limit added in #3992, -ingester.instance-limits.max-inflight-push-requests.

With the full profile I could check to see if anything unusual was blocking them.

georgehao commented 3 years ago

@bboreham The -ingester.instance-limits.max-inflight-push-requests option is not a good idea. The goroutine become thousands only when ingester starting. But If the Ingester started, the goroutine only just 100. So just let the ingester started, I need add so many machine to avoid lock competition. this is expensive.

The -ingester.instance-limits.max-inflight-push-requests option I also tried, because the timeseries is so many of per request, the max-inflight-push-requests need set so small, so when ingester starting, the max-inflight-push-requests is not useful.

I also check the reason, the mutex all locked at "github.com/prometheus/prometheus/tsdb.(*Head).getOrCreateWithID+0x214".

The root cause is that:

  1. when ingester starting, the hash of series label not is in memory, so it's need create a tsdb head.
  2. create tsdb head need add tsdb memory-postings
  3. add tsdb memory-postings need sort all the SeriesID
  4. when the number of SeriesID is large, this is very cost of time

So , The solution may be: reduce the cost of sort the postings SeriesID

georgehao commented 3 years ago

also this is may be another way.

like #4408. if can split many tenant and can query at once. when ingester pushed, there may be many db of tenant. this can decrease the lock competition

bboreham commented 3 years ago

Now it looks like #3349. We had a PR proposed which sharded locks.

I think you're saying MemPostings.addFor() does a sort which is slow. It does look suspiciously like a bubble-sort.

All of that code is now from Prometheus; it would be necessary to create a benchmark there to demonstrate the issue, and PR to fix it there.

By the way, you said "cortex version: release-1.9.0", but this line from the profile extract does not match:

0x9fc994 github.com/prometheus/prometheus/tsdb.(*Head).getOrCreateWithID+0x214 /home/xx/gopath/src/github.com/cortexproject/cortex/vendor/github.com/prometheus/prometheus/tsdb/head.go:1800

Here is line 1800 for v1.9.0: https://github.com/cortexproject/cortex/blob/ed4f3395f8ba0098a827c5ae4ca8a2901674806c/vendor/github.com/prometheus/prometheus/tsdb/head.go#L1800

Version 1.8 is a much better match: https://github.com/cortexproject/cortex/blob/8cf4eef159cc9055d0e6240fd20d98568d214fb8/vendor/github.com/prometheus/prometheus/tsdb/head.go#L1800

bboreham commented 3 years ago

I looked at the Prometheus code here: https://github.com/prometheus/prometheus/blob/bb05485c79084fecd3602eceafca3d554ab88987/tsdb/index/postings.go#L326-L331

which, if series IDs were in random order, would be very slow. However in Cortex the series IDs are generated by incrementing a counter, so we always add to the end of the list.

georgehao commented 3 years ago

@bboreham yes, the pprof profile is release-1.8。I copy the wrong one.

yeah, I say MemPostings.addFor().

seriesID increase

it seem that seriesID is created and increased by prometheus tsbd.Head : https://github.com/prometheus/prometheus/blob/bb05485c79/tsdb/head.go#L1185

And the seriesID is increase by h.lastSeriesID.Inc(), so seriesID is grow bigger.

so in prometheus, it seem if seriesID pass to MemPostings.addFor() by order, it will add to the end of the list. BUT..

cortex

But cortex only have one db, and the db only have one head, so when v2Push is called by goroutines, lastSeriesID is increased by prometheus Head.getOrCreate, because of the Head. getOrCreateWithID lock, it maybe not insert into the MemPosting.

so, When a goroutine fetch the lock, and send to head.MemPostings, it maybe smaller than last seriesId inserted. In my test, when cortex ingester starting, this lock let the seriesId out of order, so it sort the seriesID cost large of time