Registry scanner: handle "unknown blob"

slamdev commented 4 years ago

Currently flux stops automation for an application when any tag of an app image is broken. In my case that's what is shown in logs:

ts=2020-04-06T14:33:01.561422612Z caller=repocachemanager.go:226 component=warmer canonical_name=eu.gcr.io/my/app auth="{map[eu.gcr.io:<registry creds for _json_key@eu.gcr.io, from some-namespace:secret/image-puller>]}" err="unknown blob" ref=eu.gcr.io/my/app:f9804f8db206f884162adc5151d65ec926731e41

No idea how a broken tag appeared in a registry.

A feature would be to skip broken tags from scanning and work with the tags that are valid. If for some reasons the current behaviour is desired, then at least provide prometheus metric, so an alert can be configure for such case:

# HELP flux_registry_sync_failed_tag Number of failed tag for image
# TYPE flux_registry_sync_failed_tag counter
flux_registry_sync_failed_tag{image="eu.gcr.io/my/app"} 1

tarioch commented 4 years ago

as far as I can tell a similar issue is also if there is missing metadata. e.g.

ts=2020-04-17T21:05:04.811668584Z caller=images.go:95 component=sync-loop workload=default:helmrelease/nextcloud container=chart-image repo=nextcloud pattern=glob:* current=nextcloud:17.0.3 warning="inconsistent repository metadata: missing metadata for image tag \"13.0.3RC1-apache\"" action="skip container"

In this case I would like to have an option that this is either automatically ignore or at least there would be a way to exclude that image tag from the scanning so it can proceed.

valeriano-manassero commented 4 years ago

Same issue here:

fluxcd-66cf6db5ff-s84gd flux ts=2020-10-16T08:12:54.669714152Z caller=warming.go:198 component=warmer info="refreshing image" image=polyaxon/polyaxon-streams tag_count=179 to_update=1 of_which_refresh=0 of_which_missing=1
fluxcd-66cf6db5ff-s84gd flux ts=2020-10-16T08:12:54.943753025Z caller=repocachemanager.go:226 component=warmer canonical_name=index.docker.io/polyaxon/polyaxon-streams auth={map[]} err="unknown blob" ref=polyaxon/polyaxon-streams:1.1.9-rc7
fluxcd-66cf6db5ff-s84gd flux ts=2020-10-16T08:12:54.943846713Z caller=warming.go:206 component=warmer updated=polyaxon/polyaxon-streams successful=0 attempted=1

Configuration I'm using is:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: polyaxon
  namespace: polyaxon
  annotations:
    fluxcd.io/automated: "true"
    filter.fluxcd.io/gateway: "semver: 1.*"
    filter.fluxcd.io/api: "semver: 1.*"
    filter.fluxcd.io/streams: "semver: 1.*"
    filter.fluxcd.io/init: "semver: 1.*"
    filter.fluxcd.io/sidecar: "semver: 1.*"
    filter.fluxcd.io/agent: "semver: 1.*"
    filter.fluxcd.io/operator: "semver: 1.*"
    filter.fluxcd.io/scheduler: "semver: 1.*"
spec:
  values:
    gateway:
      image: polyaxon/polyaxon-gateway
      imageTag: 1.1.8
    api:
      image: polyaxon/polyaxon-api
      imageTag: 1.1.8
    streams:
      image: polyaxon/polyaxon-streams
      imageTag: 1.1.8
    init:
      image: polyaxon/polyaxon-init
      imageTag: 1.1.8
    sidecar:
      image: polyaxon/polyaxon-sidecar
      imageTag: 1.1.8
    agent:
      image: polyaxon/polyaxon-agent
      imageTag: 1.1.8
    operator:
      image: polyaxon/polyaxon-operator
      imageTag: 1.1.8
    scheduler:
      image: polyaxon/polyaxon-scheduler
      imageTag: 1.1.8

kingdonb commented 3 years ago

This is a well known issue, and it is resolved in Flux v2 in a creative way – :roll_safe pointing to head meme: you can't have an image metadata error if you don't pull any image layers!

The only way to mitigate this in Flux v1 right now is to add image tag filters or fix the image repository. Flux v1 uses image build timestamps by default to order images, and if the image list contains an image with an invalid or missing timestamp, this process cannot reliably guarantee to have set the latest image, so it balks.

If an image filter policy is in place, the missing metadata issue only affects flux daemon's ability to distinguish between images that are pulled. Any images that do not match the filter do not get their metadata pulled, so they will not interfere.

Still, even with semver filter where the image metadata shouldn't be relevant to the decision about which image is latest, this problem manifests in issues like #548 and #3417 because the image metadata is so pervasive in the design of Flux v1, this was a driving factor in rewriting Flux from scratch.

Since Flux v2 is now at feature parity, we are recommending all users to upgrade as soon as possible so that we can validate any remaining use cases are covered at least as well by Flux v2, prior to announcing GA and declaring Flux v2 as "1.0"

Apologies about the length of time that has elapsed since your inquiry.

If this is still affecting you, I will be happy to reopen and/or troubleshoot with you in the #flux slack channel on CNCF, but in the interest of reducing the number of open issues not directly related to supporting Flux v1 in maintenance mode down to something manageable, and respecting you may have moved on already, I will go ahead and close out this issue for now.

fluxcd / flux

Registry scanner: handle "unknown blob" #2980