fluxcd / flux

Successor: https://github.com/fluxcd/flux2
https://fluxcd.io
Apache License 2.0
6.9k stars 1.08k forks source link

Sync fails due to (plain) metrics discovery error #2087

Closed marcossv9 closed 5 years ago

marcossv9 commented 5 years ago

Hi I'm getting started with flux and following the guide on the repository. I'm running K8s cluster with AWS EKS v1.12 and worker nodes in EC2 instances.

After deploy flux to my cluster I'm getting this errors:

ts=2019-05-23T17:56:42.830342392Z caller=main.go:210 type="internal kubernetes error" kubernetes_caller=vendor/k8s.io/client-go/discovery/cached/memory/memcache.go:199 err="couldn't get resource list for metrics/v1alpha1: the server could not find the requested resource"
ts=2019-05-23T18:01:42.855450572Z caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: unable to retrieve the complete list of server APIs: metrics/v1alpha1: the server could not find the requested resource"

Then the state of metrics that I installed following the AWS doc metrics

kubectl get --raw /apis/metrics.k8s.io/v1beta1
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"metrics.k8s.io/v1beta1","resources":[{"name":"nodes","singularName":"","namespaced":false,"kind":"NodeMetrics","verbs":["get","list"]},{"name":"pods","singularName":"","namespaced":true,"kind":"PodMetrics","verbs":["get","list"]}]}

Doing a flux sync the output is:

HEAD of master is 1d0b9e1
Waiting for 1d0b9e1 to be applied ...
== Error ==

Error: fatal: bad revision 'flux-sync..1d0b9e1', full output:
 fatal: bad revision 'flux-sync..1d0b9e1'

We don't have a specific help message for the error above.
It would help us remedy this if you log an issue at

https://github.com/weaveworks/flux/issues

saying what you were doing when you saw this, and quoting the message
at the top.

I made a change to my cloned repository to master but never get it synchronized on the cluster.

Flux version is 1.12.3

Any help would be appreciated.

marcossv9 commented 5 years ago

I almost forget, metrics version is v0.3.3.

marcossv9 commented 5 years ago

It seems like is trying to get data from metrics API 1alpha1 but I'm using 1beta1..

2opremio commented 5 years ago

metrics/v1alpha1 seems odd, I think it should be metrics.k8s.io/v1alpha1 (whose error would had been ignored correctly by #2009 )

I am not a metrics user, but let's try to pinpoint where metrics/v1alpha1 is coming from. Can you please paste some further information?

  1. The full logs of Flux (thanks for the excerpt BTW)
  2. The output of kubectl get --raw /apis/
  3. The output of kubectl get --raw /apis/metrics/v1alpha1
  4. The output of kubectl get --raw /apis/metrics.k8s.io/v1alpha1
marcossv9 commented 5 years ago

Hello @2opremio , thanks for the fast response. Here are the logs and outputs you requested:

1- ts=2019-05-24T17:37:29.153985791Z caller=main.go:193 version=1.12.3 ts=2019-05-24T17:37:29.194787785Z caller=main.go:350 component=cluster identity=/etc/fluxd/ssh/identity ts=2019-05-24T17:37:29.194832968Z caller=main.go:351 component=cluster identity.pub="ssh-rsa XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" ts=2019-05-24T17:37:29.194858103Z caller=main.go:352 component=cluster host=https://172.20.0.1:443 version=kubernetes-v1.12.6-eks-d69f1b ts=2019-05-24T17:37:29.194905157Z caller=main.go:364 component=cluster kubectl=/usr/local/bin/kubectl ts=2019-05-24T17:37:29.197765233Z caller=main.go:375 component=cluster ping=true ts=2019-05-24T17:37:29.201955341Z caller=main.go:508 url=git@github.com:marcossv9/flux-get-started user="Weave Flux" email=support@weave.works signing-key= sync-tag=flux-sync notes-ref=flux set-author=false ts=2019-05-24T17:37:29.20219346Z caller=main.go:565 upstream="no upstream URL given" ts=2019-05-24T17:37:29.202303724Z caller=main.go:594 metrics-addr=:3031 ts=2019-05-24T17:37:29.211677433Z caller=images.go:18 component=sync-loop msg="polling images" ts=2019-05-24T17:37:29.213655932Z caller=images.go:28 component=sync-loop msg="no automated workloads" ts=2019-05-24T17:37:29.21391641Z caller=loop.go:90 component=sync-loop err="git repo not ready: git repo has not been cloned yet" ts=2019-05-24T17:37:29.228989507Z caller=main.go:586 addr=:3030 ts=2019-05-24T17:37:29.367830101Z caller=aws.go:137 component=aws info="detected cluster region" source="EC2 metadata service" region=us-east-1 ts=2019-05-24T17:37:29.36797113Z caller=aws.go:104 component=aws info="restricting ECR registry scans" regions=[us-east-1] include-ids=[] exclude-ids=[602401143452] ts=2019-05-24T17:37:30.327198176Z caller=checkpoint.go:24 component=checkpoint msg="up to date" latest=1.12.3 ts=2019-05-24T17:37:30.434601096Z caller=memcached.go:112 component=memcached err="Fetching tag from memcache: memcache: connect timeout to 172.20.18.181:11211" ts=2019-05-24T17:37:30.434670793Z caller=warming.go:162 component=warmer canonical_name=index.docker.io/weaveworks/flux auth={map[]} err="fetching previous result from cache: memcache: connect timeout to 172.20.18.181:11211" ts=2019-05-24T17:37:31.434907559Z caller=memcached.go:112 component=memcached err="Fetching tag from memcache: memcache: connect timeout to 172.20.18.181:11211" ts=2019-05-24T17:37:31.434973497Z caller=warming.go:162 component=warmer canonical_name=index.docker.io/library/memcached auth={map[]} err="fetching previous result from cache: memcache: connect timeout to 172.20.18.181:11211" ts=2019-05-24T17:37:31.515785019Z caller=warming.go:198 component=warmer info="refreshing image" image=docker.io/amazon/aws-alb-ingress-controller tag_count=4 to_update=4 of_which_refresh=0 of_which_missing=4 ts=2019-05-24T17:37:31.706408025Z caller=warming.go:206 component=warmer updated=docker.io/amazon/aws-alb-ingress-controller successful=4 attempted=4 ts=2019-05-24T17:37:31.706616953Z caller=images.go:18 component=sync-loop msg="polling images" ts=2019-05-24T17:37:31.706637351Z caller=images.go:28 component=sync-loop msg="no automated workloads" ts=2019-05-24T17:38:29.382191464Z caller=warming.go:198 component=warmer info="refreshing image" image=docker.io/weaveworks/flux tag_count=31 to_update=31 of_which_refresh=0 of_which_missing=31 ts=2019-05-24T17:38:31.090721139Z caller=warming.go:206 component=warmer updated=docker.io/weaveworks/flux successful=31 attempted=31 ts=2019-05-24T17:38:31.091264947Z caller=images.go:18 component=sync-loop msg="polling images" ts=2019-05-24T17:38:31.091286631Z caller=images.go:28 component=sync-loop msg="no automated workloads" ts=2019-05-24T17:38:31.156726929Z caller=warming.go:198 component=warmer info="refreshing image" image=memcached tag_count=70 to_update=70 of_which_refresh=0 of_which_missing=70 ts=2019-05-24T17:38:35.81443173Z caller=warming.go:206 component=warmer updated=memcached successful=70 attempted=70 ts=2019-05-24T17:38:35.814546064Z caller=images.go:18 component=sync-loop msg="polling images" ts=2019-05-24T17:38:35.814570519Z caller=images.go:28 component=sync-loop msg="no automated workloads" ts=2019-05-24T17:38:45.28335996Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=2c506713e0128b050feb3086df044646a42e0e30 ts=2019-05-24T17:38:45.326982872Z caller=main.go:210 type="internal kubernetes error" kubernetes_caller=vendor/k8s.io/client-go/discovery/cached/memory/memcache.go:199 err="couldn't get resource list for metrics/v1alpha1: the server could not find the requested resource" ts=2019-05-24T17:38:45.331883668Z caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: unable to retrieve the complete list of server APIs: metrics/v1alpha1: the server could not find the requested resource" ts=2019-05-24T17:39:15.513402813Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=2c506713e0128b050feb3086df044646a42e0e30 ts=2019-05-24T17:39:45.746853739Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=2c506713e0128b050feb3086df044646a42e0e30 ts=2019-05-24T17:40:15.964621885Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=2c506713e0128b050feb3086df044646a42e0e30 ts=2019-05-24T17:40:46.181612072Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=2c506713e0128b050feb3086df044646a42e0e30 ts=2019-05-24T17:41:16.474890173Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=2c506713e0128b050feb3086df044646a42e0e30 ts=2019-05-24T17:41:29.602534796Z caller=warming.go:198 component=warmer info="refreshing image" image=alpine tag_count=22 to_update=22 of_which_refresh=0 of_which_missing=22 ts=2019-05-24T17:41:31.055689361Z caller=warming.go:206 component=warmer updated=alpine successful=22 attempted=22 ts=2019-05-24T17:41:31.055891166Z caller=images.go:18 component=sync-loop msg="polling images" ts=2019-05-24T17:41:31.088149416Z caller=images.go:112 component=sync-loop workload=demo:deployment/podinfo container=init repo=alpine pattern=regexp:^3.* current=alpine:3.5 info="added update to automation run" new=alpine:3.9 reason="latest 3.9 (2019-05-11 00:07:03.510395965 +0000 UTC) > current 3.5 (2019-01-30 22:20:40.179652676 +0000 UTC)" ts=2019-05-24T17:41:31.088231991Z caller=loop.go:111 component=sync-loop jobID=08055556-afdc-5697-a554-da6339d8c8a1 state=in-progress ts=2019-05-24T17:41:31.11443791Z caller=releaser.go:58 component=sync-loop jobID=08055556-afdc-5697-a554-da6339d8c8a1 type=release updates=1 ts=2019-05-24T17:41:31.138035993Z caller=warming.go:198 component=warmer info="refreshing image" image=stefanprodan/podinfo tag_count=144 to_update=144 of_which_refresh=0 of_which_missing=144 ts=2019-05-24T17:41:33.570118477Z caller=daemon.go:276 component=sync-loop jobID=08055556-afdc-5697-a554-da6339d8c8a1 revision=d9e5aeb5127c97830666da2a281688b3a0c5419a ts=2019-05-24T17:41:33.570161494Z caller=daemon.go:624 component=daemon event="Commit: d9e5aeb, demo:deployment/podinfo" logupstream=false ts=2019-05-24T17:41:33.570224863Z caller=loop.go:123 component=sync-loop jobID=08055556-afdc-5697-a554-da6339d8c8a1 state=done success=true ts=2019-05-24T17:41:33.821970587Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=d9e5aeb5127c97830666da2a281688b3a0c5419a ts=2019-05-24T17:41:33.845089699Z caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: unable to retrieve the complete list of server APIs: metrics/v1alpha1: the server could not find the requested resource" ts=2019-05-24T17:41:40.336537551Z caller=warming.go:206 component=warmer updated=stefanprodan/podinfo successful=144 attempted=144 ts=2019-05-24T17:41:40.338566869Z caller=images.go:18 component=sync-loop msg="polling images" ts=2019-05-24T17:41:40.398206899Z caller=images.go:112 component=sync-loop workload=demo:deployment/podinfo container=init repo=alpine pattern=regexp:^3.* current=alpine:3.5 info="added update to automation run" new=alpine:3.9 reason="latest 3.9 (2019-05-11 00:07:03.510395965 +0000 UTC) > current 3.5 (2019-01-30 22:20:40.179652676 +0000 UTC)" ts=2019-05-24T17:41:40.398308848Z caller=loop.go:111 component=sync-loop jobID=a65a37f6-05b6-04c5-e2bf-aeeed0a08847 state=in-progress ts=2019-05-24T17:41:40.427179923Z caller=releaser.go:58 component=sync-loop jobID=a65a37f6-05b6-04c5-e2bf-aeeed0a08847 type=release updates=0 ts=2019-05-24T17:41:40.427210764Z caller=releaser.go:60 component=sync-loop jobID=a65a37f6-05b6-04c5-e2bf-aeeed0a08847 type=release exit="no images to update for services given" ts=2019-05-24T17:41:40.434902072Z caller=loop.go:121 component=sync-loop jobID=a65a37f6-05b6-04c5-e2bf-aeeed0a08847 state=done success=false err="no changes made in repo" ts=2019-05-24T17:41:40.666596764Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=d9e5aeb5127c97830666da2a281688b3a0c5419a ts=2019-05-24T17:42:10.980940896Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=6781af8a02da2d6d8555034ba644a6d4ea599b3b ts=2019-05-24T17:42:11.019880417Z caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: unable to retrieve the complete list of server APIs: metrics/v1alpha1: the server could not find the requested resource" ts=2019-05-24T17:42:41.195698779Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=6781af8a02da2d6d8555034ba644a6d4ea599b3b ts=2019-05-24T17:43:11.684093976Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=6781af8a02da2d6d8555034ba644a6d4ea599b3b ts=2019-05-24T17:43:41.916043894Z caller=loop.go:103 component=sync-loop event=refreshed url=git@github.com:marcossv9/flux-get-started branch=master HEAD=6781af8a02da2d6d8555034ba644a6d4ea599b3b

2- {"kind":"APIGroupList","apiVersion":"v1","groups":[{"name":"apiregistration.k8s.io","versions":[{"groupVersion":"apiregistration.k8s.io/v1","version":"v1"},Error from server (NotFound): the server could not find the requested resource{"groupVersion":"apiregistration.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"apiregistration.k8s.io/v1","version":"v1"}},{"name":"extensions","versions":[{"groupVersion":"extensions/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"extensions/v1beta1","version":"v1beta1"}},{"name":"apps","versions":[{"groupVersion":"apps/v1","version":"v1"},{"groupVersion":"apps/v1beta2","version":"v1beta2"},{"groupVersion":"apps/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"apps/v1","version":"v1"}},{"name":"events.k8s.io","versions":[{"groupVersion":"events.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"events.k8s.io/v1beta1","version":"v1beta1"}},{"name":"authentication.k8s.io","versions":[{"groupVersion":"authentication.k8s.io/v1","version":"v1"},{"groupVersion":"authentication.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"authentication.k8s.io/v1","version":"v1"}},{"name":"authorization.k8s.io","versions":[{"groupVersion":"authorization.k8s.io/v1","version":"v1"},{"groupVersion":"authorization.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"authorization.k8s.io/v1","version":"v1"}},{"name":"autoscaling","versions":[{"groupVersion":"autoscaling/v1","version":"v1"},{"groupVersion":"autoscaling/v2beta1","version":"v2beta1"},{"groupVersion":"autoscaling/v2beta2","version":"v2beta2"}],"preferredVersion":{"groupVersion":"autoscaling/v1","version":"v1"}},{"name":"batch","versions":[{"groupVersion":"batch/v1","version":"v1"},{"groupVersion":"batch/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"batch/v1","version":"v1"}},{"name":"certificates.k8s.io","versions":[{"groupVersion":"certificates.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"certificates.k8s.io/v1beta1","version":"v1beta1"}},{"name":"networking.k8s.io","versions":[{"groupVersion":"networking.k8s.io/v1","version":"v1"}],"preferredVersion":{"groupVersion":"networking.k8s.io/v1","version":"v1"}},{"name":"policy","versions":[{"groupVersion":"policy/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"policy/v1beta1","version":"v1beta1"}},{"name":"rbac.authorization.k8s.io","versions":[{"groupVersion":"rbac.authorization.k8s.io/v1","version":"v1"},{"groupVersion":"rbac.authorization.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"rbac.authorization.k8s.io/v1","version":"v1"}},{"name":"storage.k8s.io","versions":[{"groupVersion":"storage.k8s.io/v1","version":"v1"},{"groupVersion":"storage.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"storage.k8s.io/v1","version":"v1"}},{"name":"admissionregistration.k8s.io","versions":[{"groupVersion":"admissionregistration.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"admissionregistration.k8s.io/v1beta1","version":"v1beta1"}},{"name":"apiextensions.k8s.io","versions":[{"groupVersion":"apiextensions.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"apiextensions.k8s.io/v1beta1","version":"v1beta1"}},{"name":"scheduling.k8s.io","versions":[{"groupVersion":"scheduling.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"scheduling.k8s.io/v1beta1","version":"v1beta1"}},{"name":"coordination.k8s.io","versions":[{"groupVersion":"coordination.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"coordination.k8s.io/v1beta1","version":"v1beta1"}},{"name":"crd.k8s.amazonaws.com","versions":[{"groupVersion":"crd.k8s.amazonaws.com/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"crd.k8s.amazonaws.com/v1alpha1","version":"v1alpha1"}},{"name":"metrics","versions":[{"groupVersion":"metrics/v1alpha1","version":"v1alpha1"}],"preferredVersion":{"groupVersion":"metrics/v1alpha1","version":"v1alpha1"}},{"name":"metrics.k8s.io","versions":[{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}}]}

3- Error from server (NotFound): the server could not find the requested resource

4- Error from server (NotFound): the server could not find the requested resource

Please let me know if you need something else.

marcossv9 commented 5 years ago

Adding some more info, I think memcache for flux is asking for metrics API for old kubernetes 1.7 version. (metrics/v1alpha1). But I'm using the metrics API for kubernetes 1.8+ version (metrics.k8s.io/v1beta1)...

2opremio commented 5 years ago

memcache doesn't cache resources/groupversions , just container image metadata.

2opremio commented 5 years ago

To be more precise, there is an in-memory cache for the resource definitions but it gets invalidated every 5 minutes

2opremio commented 5 years ago

Here's the problem (from kubectl get --raw /apis/):

    {
      "name": "metrics",
      "versions": [
        {
          "groupVersion": "metrics/v1alpha1",
          "version": "v1alpha1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "metrics/v1alpha1",
        "version": "v1alpha1"
      }
    },
    {
      "name": "metrics.k8s.io",
      "versions": [
        {
          "groupVersion": "metrics.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "metrics.k8s.io/v1beta1",
        "version": "v1beta1"
      }
    }

We do ignore errors in metrics.k8s.io (see #2009 ) but not on plain metrics (which I didn't know was a thing) and seems to be misconfigured in your cluster (since kubectl get --raw /apis/metrics/v1alpha1 fails)

2opremio commented 5 years ago

So, this is another incarnation of https://github.com/weaveworks/flux/issues/1991 , sigh

marcossv9 commented 5 years ago

Thanks @2opremio I will look into that on next Monday!

marcossv9 commented 5 years ago

@2opremio this quick fix will be available on next release?

2opremio commented 5 years ago

Yep!

marcossv9 commented 5 years ago

Yep!

Awesome! Thanks!