fluxcd / flux

Successor: https://github.com/fluxcd/flux2
https://fluxcd.io
Apache License 2.0
6.9k stars 1.08k forks source link

Flux using >50% of one CPU #926

Closed rbruggem closed 6 years ago

rbruggem commented 6 years ago

Noticing multiple kubectl apply every second.

This occurs with v1.2.0 and v1.2.1.

rade commented 6 years ago

I have bumped our sock shop cluster to the latest flux (1.2.2) and all other latest weave agents. The mega yaml apply which was previously broken (due to #916) now works. Alas flux is still using about 60% of a core. According to Explore, about half of that is kubectl'ing. It is kubectl applying every ~15s - is that expected?

squaremo commented 6 years ago

It is kubectl applying every ~15s - is that expected?

No; left alone, it should run every few minutes. It will be triggered by

So: either the tick has been tuned down radically, or there are constantly new images and the automation interval has been tuned down radically (or, there's a bug).

Is there an indication in the logs of what triggers the sync? I'd expect some messages about jobs running if it's those or automation.

rbruggem commented 6 years ago

Attaching logs provided by a flux user who experienced this issue flux.log

rade commented 6 years ago

(bear in mind we may be looking at two distinct problems here, in the two instances we are looking at)

The sock-shop fluxd has no special tuning. I can see no obvious trigger in the logs for the kubectl applys, but TBH I don't really know what I am looking for:

ts=2018-02-02T10:57:47.49427888Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2018-02-02T10:57:47.95983943Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/carts-db container=carts-db currentimage=mongo repo=mongo pattern=*
ts=2018-02-02T10:57:47.960130666Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/user container=user currentimage=weaveworksdemos/user:0.4.7 repo=weaveworksdemos/user pattern=*
ts=2018-02-02T10:57:47.960243493Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/shipping container=shipping currentimage=weaveworksdemos/shipping:master-ba362a4c repo=weaveworksdemos/shipping pattern=*
ts=2018-02-02T10:57:47.960450487Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/queue-master container=queue-master currentimage=weaveworksdemos/queue-master:master-b8bcfc79 repo=weaveworksdemos/queue-master pattern=*
ts=2018-02-02T10:57:47.96061811Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/user-db container=user-db currentimage=weaveworksdemos/user-db:0.4.7 repo=weaveworksdemos/user-db pattern=*
ts=2018-02-02T10:57:47.960773579Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/orders-db container=orders-db currentimage=mongo repo=mongo pattern=*
ts=2018-02-02T10:57:47.960956102Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/payment container=payment currentimage=weaveworksdemos/payment:master-b64b8af1 repo=weaveworksdemos/payment pattern=*
ts=2018-02-02T10:57:47.96112886Z caller=images.go:57 component=sync-loop service=sock-shop:deployment/orders container=orders currentimage=weaveworksdemos/orders:master-f8988a3d repo=weaveworksdemos/orders pattern=*
ts=2018-02-02T10:57:51.518949283Z caller=release.go:61 component=platform method=Sync cmd=apply args="--namespace default" count=1
ts=2018-02-02T10:57:52.064705257Z caller=release.go:100 component=platform method=Sync cmd="kubectl --namespace default apply -f -" took=545.657746ms err=null output="namespace \"sock-shop\" configured"
ts=2018-02-02T10:57:52.066493091Z caller=release.go:61 component=platform method=Sync cmd=apply args= count=29
ts=2018-02-02T10:57:56.555391705Z caller=release.go:100 component=platform method=Sync cmd="kubectl apply -f -" took=4.488656238s err=null output="deployment \"carts-db\" configured\nservice \"carts-db\" configured\ndeployment \"catalogue\" configured\ndeployment \"load-test\" configured\ndeployment \"orders\" configured\ndeployment \"user-db\" configured\nservice \"orders\" configured\nservice \"session-db\" configured\ndeployment \"shipping\" configured\ndeployment \"orders-db\" configured\ndeployment \"payment\" configured\ndeployment \"rabbitmq\" configured\nservice \"user\" configured\ndeployment \"front-end\" configured\ndeployment \"session-db\" configured\nservice \"carts\" configured\nservice \"payment\" configured\nservice \"queue-master\" configured\ndeployment \"carts\" configured\nservice \"front-end\" configured\ndeployment \"queue-master\" configured\nservice \"user-db\" configured\ndeployment \"user\" configured\ndeployment \"catalogue-db\" configured\nservice \"catalogue-db\" configured\nservice \"catalogue\" configured\nservice \"orders-db\" configured\nservice \"rabbitmq\" configured\nservice \"shipping\" configured"
ts=2018-02-02T10:58:06.158436936Z caller=release.go:61 component=platform method=Sync cmd=apply args="--namespace default" count=1
ts=2018-02-02T10:58:06.839208202Z caller=release.go:100 component=platform method=Sync cmd="kubectl --namespace default apply -f -" took=680.582364ms err=null output="namespace \"sock-shop\" configured"
ts=2018-02-02T10:58:06.839294152Z caller=release.go:61 component=platform method=Sync cmd=apply args= count=29
ts=2018-02-02T10:58:13.05913582Z caller=release.go:100 component=platform method=Sync cmd="kubectl apply -f -" took=6.217109974s err=null output="service \"user-db\" configured\ndeployment \"user\" configured\nservice \"orders\" configured\nservice \"rabbitmq\" configured\nservice \"front-end\" configured\ndeployment \"payment\" configured\nservice \"orders-db\" configured\nservice \"session-db\" configured\ndeployment \"catalogue-db\" configured\ndeployment \"catalogue\" configured\ndeployment \"orders-db\" configured\ndeployment \"queue-master\" configured\ndeployment \"user-db\" configured\nservice \"user\" configured\nservice \"carts-db\" configured\nservice \"catalogue-db\" configured\nservice \"payment\" configured\ndeployment \"session-db\" configured\ndeployment \"shipping\" configured\ndeployment \"orders\" configured\nservice \"queue-master\" configured\ndeployment \"rabbitmq\" configured\nservice \"catalogue\" configured\ndeployment \"front-end\" configured\ndeployment \"load-test\" configured\nservice \"shipping\" configured\ndeployment \"carts-db\" configured\ndeployment \"carts\" configured\nservice \"carts\" configured"
ts=2018-02-02T10:58:24.762192817Z caller=release.go:61 component=platform method=Sync cmd=apply args="--namespace default" count=1
ts=2018-02-02T10:58:26.201710224Z caller=release.go:100 component=platform method=Sync cmd="kubectl --namespace default apply -f -" took=1.439309753s err=null output="namespace \"sock-shop\" configured"
ts=2018-02-02T10:58:26.202375657Z caller=release.go:61 component=platform method=Sync cmd=apply args= count=29
ts=2018-02-02T10:58:32.198947907Z caller=release.go:100 component=platform method=Sync cmd="kubectl apply -f -" took=5.99636982s err=null output="deployment \"shipping\" configured\nservice \"shipping\" configured\ndeployment \"user-db\" configured\ndeployment \"carts-db\" configured\nservice \"catalogue-db\" configured\nservice \"session-db\" configured\ndeployment \"front-end\" configured\nservice \"orders\" configured\ndeployment \"queue-master\" configured\ndeployment \"session-db\" configured\nservice \"carts\" configured\ndeployment \"catalogue\" configured\nservice \"catalogue\" configured\ndeployment \"orders-db\" configured\nservice \"carts-db\" configured\nservice \"rabbitmq\" configured\ndeployment \"user\" configured\nservice \"front-end\" configured\nservice \"orders-db\" configured\nservice \"queue-master\" configured\nservice \"user-db\" configured\ndeployment \"carts\" configured\ndeployment \"load-test\" configured\ndeployment \"payment\" configured\ndeployment \"catalogue-db\" configured\ndeployment \"orders\" configured\nservice \"payment\" configured\ndeployment \"rabbitmq\" configured\nservice \"user\" configured"
squaremo commented 6 years ago

Right, no messages about running jobs. It does look an awful lot like it's just applying the namespace, applying everything else, then goto 10.

squaremo commented 6 years ago

A naive attempt to reproduce this locally fails: I created a fresh minikube and pointed flux at the flux-example repo, and it behaves as expected. So there's something else going on.

squaremo commented 6 years ago

The log attached above also shows the sync running every five minutes, when left to its own devices. Perhaps the CPU use there has a different cause.

stefanprodan commented 6 years ago

I could reproduce this on GKE with the following repo https://github.com/stefanprodan/flux-demo The repo has just two deployments, the caddy one is automated while podinfo is not.

I'm using a n1-standard-2 node and after deploying Flux the average CPU usage increased by ~50%. I didn't noticed kubectl running more than once per minute and the CPU usage of kubectl is under 1% while fluxd uses around 50% most of time.

GCP CPU usage

screen shot 2018-02-03 at 10 25 36

Scope

screen shot 2018-02-03 at 10 18 32

Cortex

screen shot 2018-02-03 at 10 28 15 screen shot 2018-02-03 at 10 20 04

This is what Flux logs every minute:

ts=2018-02-03T08:36:21.588319201Z caller=warming.go:190 component=warmer fetching=nats-streaming total=10 expired=0 missing=10
ts=2018-02-03T08:36:21.732550794Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:21.737490165Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:21.851009807Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:21.864899317Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.019054839Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.03614893Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.047498709Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.076802372Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.090084019Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.107090559Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:36:22.107171959Z caller=warming.go:238 component=warmer updated=nats-streaming count=0
ts=2018-02-03T08:36:22.107275327Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2018-02-03T08:36:22.130578863Z caller=images.go:57 component=sync-loop service=default:deployment/caddy container=caddy currentimage=stefanprodan/caddy:0.10.10 repo=stefanprodan/caddy pattern=*

If I remove the automated deployment the CPU usage stays the same:

ts=2018-02-03T08:47:21.690835107Z caller=warming.go:190 component=warmer fetching=nats-streaming total=10 expired=0 missing=10
ts=2018-02-03T08:47:21.81497541Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.078176684Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.131129014Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.131578471Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.132031944Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.135878971Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.138002694Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.143104445Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.174781168Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.298959979Z caller=warming.go:215 component=warmer canonical_name=index.docker.io/library/nats-streaming auth={map[]} err="requesting manifests: no suitable manifest (linux amd64) in manifestlist"
ts=2018-02-03T08:47:22.299053202Z caller=warming.go:238 component=warmer updated=nats-streaming count=0
ts=2018-02-03T08:47:22.299196829Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2018-02-03T08:47:22.303798986Z caller=images.go:28 component=sync-loop msg="no automated services"
stefanprodan commented 6 years ago

pprof top:

Showing nodes accounting for 24850ms, 87.69% of 28340ms total
Dropped 167 nodes (cum <= 141.70ms)
Showing top 10 nodes out of 17
      flat  flat%   sum%        cum   cum%
    1390ms  4.90%  4.90%    28140ms 99.29%  github.com/weaveworks/flux/registry/cache.(*Warmer).Loop /Users/aleph/go/src/github.com/weaveworks/flux/registry/cache/warming.go
   10050ms 35.46% 40.37%    21050ms 74.28%  runtime.selectgo /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
    1680ms  5.93% 46.29%     4870ms 17.18%  runtime.sellock /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
    1400ms  4.94% 51.24%     4200ms 14.82%  runtime.selunlock /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
    3200ms 11.29% 62.53%     3200ms 11.29%  runtime.lock /usr/local/Cellar/go/1.9.2/libexec/src/runtime/lock_futex.go
    2800ms  9.88% 72.41%     2800ms  9.88%  runtime.unlock /usr/local/Cellar/go/1.9.2/libexec/src/runtime/lock_futex.go
    2320ms  8.19% 80.59%     2790ms  9.84%  runtime.selectrecv /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
     780ms  2.75% 83.35%      980ms  3.46%  runtime.selectdefault /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go
     420ms  1.48% 84.83%      920ms  3.25%  runtime.selectnbrecv /usr/local/Cellar/go/1.9.2/libexec/src/runtime/chan.go
     810ms  2.86% 87.69%      810ms  2.86%  runtime.(*hchan).sortkey /usr/local/Cellar/go/1.9.2/libexec/src/runtime/select.go (inline)
squaremo commented 6 years ago

Good sleuthing @stefanprodan!

(For the sake of completeness, I should record that the sock shop was running kubectl constantly because it had an experimental GitHub push hook configured.)