containrrr / shepherd

Docker swarm service for automatically updating your services whenever their image is refreshed
https://hub.docker.com/r/mazzolino/shepherd
MIT License
503 stars 87 forks source link

Shepherd updates back and forth between sha version and latest #110

Open GuyKh opened 1 year ago

GuyKh commented 1 year ago

See pic: image

Very often I'm getting two updates, one from latest (for example, or a versioned image) to a version with sha and then back to non-sha one.

e.g.

[Shepherd] Service general_notify updated on 865954dc3903
Sat Sep  2 06:23:14 IDT 2023 Service general_notify was updated from mazzolino/apprise-microservice:0.1@sha256:3abc60085e429a51455f2e4dee656cbb96b20f4a76f2510c1a75c6e24cd0193c to mazzolino/apprise-microservice:0.1

[Shepherd] Service general_notify updated on 865954dc3903
Sat Sep  2 06:49:34 IDT 2023 Service general_notify was updated from mazzolino/apprise-microservice:0.1 to mazzolino/apprise-microservice:0.1@sha256:3abc60085e429a51455f2e4dee656cbb96b20f4a76f2510c1a75c6e24cd0193c
awptechnologies commented 1 year ago

may i ask what you are using for notifications?

GuyKh commented 1 year ago

may i ask what you are using for notifications?

Telegram

moschlar commented 9 months ago

@GuyKh Could you please verify whether this issue still persists with the latest shepherd version? If yes, please run shepherd with VERBOSE=true and share the corresponding log file with us.

Make sure to update your image specifier to containrrr/shepherd.

GuyKh commented 9 months ago

@moschlar latest update occurred 2 months ago, so is your question relates to the last 2 months - in this case - the answer is definitely YES

I can tell that since Jan 17th, things have been quiet on this front

GuyKh commented 9 months ago

It's here again :)

general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 15:16:41 IST 2024 Sleeping 60m before next update
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:16:54 IST 2024 Trying to update service general_adguard-exporter with image ebrianne/adguard-exporter:latest
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:07 IST 2024 Service general_adguard-exporter was updated!
general_shepherd.1.45n0qi5aws3p@nuc    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.45n0qi5aws3p@nuc    |                                  Dload  Upload   Total   Spent    Left  Speed
100   313  100     2  100   311      3    512 --:--:-- --:--:-- --:--:--   514
general_shepherd.1.45n0qi5aws3p@nuc    | okWed Jan 31 16:17:08 IST 2024 Cleaning up old docker images, leaving last 5
general_shepherd.1.45n0qi5aws3p@nuc    | no such manifest: docker.io/tiredofit/traefik-cloudflare-companion:latest
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:12 IST 2024 Error updating service general_cf-companion! Image tiredofit/traefik-cloudflare-companion:latest does not exist or it is not available
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:18 IST 2024 Trying to update service general_cf-ddns with image oznu/cloudflare-ddns:latest
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:33 IST 2024 Service general_cf-ddns was updated!
general_shepherd.1.45n0qi5aws3p@nuc    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.45n0qi5aws3p@nuc    |                                  Dload  Upload   Total   Spent    Left  Speed
100   285  100     2  100   283      5    778 --:--:-- --:--:-- --:--:--   785
general_shepherd.1.45n0qi5aws3p@nuc    | okWed Jan 31 16:17:34 IST 2024 Cleaning up old docker images, leaving last 5
general_shepherd.1.45n0qi5aws3p@nuc    | no such manifest: docker.io/library/docker:latest
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:38 IST 2024 Error updating service general_image-prune! Image docker:latest does not exist or it is not available
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:17:44 IST 2024 Trying to update service general_notify with image mazzolino/apprise-microservice:0.1
general_shepherd.1.45n0qi5aws3p@nuc    | Wed Jan 31 16:18:06 IST 2024 Service general_notify was updated!
general_shepherd.1.45n0qi5aws3p@nuc    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.45n0qi5aws3p@nuc    |                                  Dload  Upload   Total   Spent    Left  Speed
100   297  100     2  100   295      4    682 --:--:-- --:--:-- --:--:--   687
general_shepherd.1.45n0qi5aws3p@nuc    | okWed Jan 31 16:18:07 IST 2024 Cleaning up old docker images, leaving last 5

image

moschlar commented 9 months ago

From that log and screenshot I can't see that any service is "flapping"...

Can you share your Docker Swarm stack files?

GuyKh commented 9 months ago

Looking at the swarm files - I was using mazzolino/shepherd and not container -- so seeing: Image mazzolino/shepherd:latest does not exist or it is not available

Retrying this with containrrr/shepherd

GuyKh commented 9 months ago

Well... this still occurs:

general_shepherd.1.5w81t3foj5bk@nuc    | okThu Feb  1 12:25:00 IST 2024 Cleaning up old docker images, leaving last 3
general_shepherd.1.5w81t3foj5bk@nuc    | Thu Feb  1 12:25:06 IST 2024 Trying to update service general_cf-ddns with image oznu/cloudflare-ddns:latest
general_shepherd.1.5w81t3foj5bk@nuc    | Thu Feb  1 12:25:22 IST 2024 Service general_cf-ddns was updated!
general_shepherd.1.5w81t3foj5bk@nuc    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.5w81t3foj5bk@nuc    |                                  Dload  Upload   Total   Spent    Left  Speed
100   304  100     2  100   302      6    936 --:--:-- --:--:-- --:--:--   944
general_shepherd.1.5w81t3foj5bk@nuc    | okThu Feb  1 12:25:22 IST 2024 Cleaning up old docker images, leaving last 3

image

martadinata666 commented 9 months ago

The updating image of someimagename:sometag@shanum should be correct. As images with the same tag identified with their sha sum. That is what determines last year's "latest" with today's "latest". Sample of mine update log

service nextcloud_imaginary was updated from nextcloud/aio-imaginary:latest@sha256:f7fb3f35cdbacbaa06dbcf6bbc567e39037af1251fb3600b44c8626e3bbf0b01 to nextcloud/aio-imaginary:latest@sha256:3d1cb04f90eca6dbbaaed0f773ed092a024b0eca742b73f88f8b010025d3ab9b

What I can't really tell is why it does non-sha number. Like your first post:

[Shepherd] Service general_notify updated on 865954dc3903
Sat Sep  2 06:23:14 IDT 2023 Service general_notify was updated from mazzolino/apprise-microservice:0.1@sha256:3abc60085e429a51455f2e4dee656cbb96b20f4a76f2510c1a75c6e24cd0193c to mazzolino/apprise-microservice:0.1

From sha to non-sha. Is it because same sha? I'm not sure.

GuyKh commented 9 months ago

I'm really haven't looked in, but I think the issue is the resolving of :latest - what's the logic between resolving it to a specific sha and between the one to keep it 'latest`...

moschlar commented 9 months ago

Like I said yesterday, from your recent reports, I can not see the issue that you have been describing in your first post and the title of this issue.

Please try to reproduce this with the latest official image and show us the logging output.

mjrj97 commented 7 months ago

I'm experiencing this issue as well. Here are the messages from Apprise:

[Shepherd] Service backend_file-proxy updated on 5d4000a267bd
Wed Apr  3 02:49:03 CEST 2024 Service backend_file-proxy was updated from [ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342](http://ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342) to [ghcr.io/REDACTED/file-proxy:1.0](http://ghcr.io/REDACTED/file-proxy:1.0)
AppriseApprise | Today at 2:49 AM

[Shepherd] Service backend_file-proxy updated on 5d4000a267bd
Wed Apr  3 03:00:26 CEST 2024 Service backend_file-proxy was updated from [ghcr.io/REDACTED/file-proxy:1.0](http://ghcr.io/REDACTED/file-proxy:1.0) to [ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342](http://ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342)
AppriseApprise | Today at 3:00 AM

We're experiencing this service using either Docker Hub or GitHub container registry, so the problem is probably not the registries. Here are the logs from shepherd (verbose):

Wed Apr  3 02:48:57 CEST 2024 Trying to update service backend_file-proxy with image ghcr.io/REDACTED/file-proxy:1.0
image ghcr.io/REDACTED/file-proxy:1.0 could not be accessed on a registry to record
its digest. Each node will access ghcr.io/REDACTED/file-proxy:1.0 independently,
possibly leading to different nodes running different
versions of the image.
Wed Apr  3 02:49:03 CEST 2024 Service backend_file-proxy was updated!
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   323  100     2  100   321      9   1446 --:--:-- --:--:-- --:--:--  1461

Wed Apr  3 03:00:25 CEST 2024 Trying to update service backend_file-proxy with image ghcr.io/REDACTED/file-proxy:1.0
Wed Apr  3 03:00:26 CEST 2024 Service backend_file-proxy was updated!
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   323  100     2  100   321     10   1664 --:--:-- --:--:-- --:--:--  1682

The example has a defined version tag, but this is also happening to services using images with a latest tag. Here is our YAML file for the service:

  shepherd:
    image: containrrr/shepherd
    environment:
      SLEEP_TIME: '5m'
      FILTER_SERVICES: 'label=shepherd.autodeploy'
      ROLLBACK_ON_FAILURE: 'true'
      REGISTRIES_FILE: /var/run/secrets/shepherd-registries-auth
      WITH_REGISTRY_AUTH: 'true'
      APPRISE_SIDECAR_URL: 'notify:5000'
      TZ: Europe/Berlin
    secrets:
      - shepherd-registries-auth
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - notification
    deploy:
      placement:
        constraints:
          - node.role == manager

btw. thanks for a great service. Shepherd has really improved our deployment strategy.

GuyKh commented 7 months ago

Like I said yesterday, from your recent reports, I can not see the issue that you have been describing in your first post and the title of this issue.

Please try to reproduce this with the latest official image and show us the logging output.

Just reproduced this with latest version: Getting this:

[Shepherd] Service general_ouroboros updated on 08186ba56442
Thu Apr  4 12:29:43 IDT 2024 Service general_ouroboros was updated from pyouroboros/ouroboros:latest to pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372

Logs:

general_shepherd.1.xcjzlmsdmhry@ubuntu    | Thu Apr  4 12:29:20 IDT 2024 Trying to update service general_ouroboros with image pyouroboros/ouroboros:latest
general_shepherd.1.xcjzlmsdmhry@ubuntu    | Thu Apr  4 12:29:43 IDT 2024 Service general_ouroboros was updated!
general_shepherd.1.xcjzlmsdmhry@ubuntu    |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
general_shepherd.1.xcjzlmsdmhry@ubuntu    |                                  Dload  Upload   Total   Spent    Left  Speed
100   310  100     2  100   308      5    783 --:--:-- --:--:-- --:--:--   788
general_shepherd.1.xcjzlmsdmhry@ubuntu    | okThu Apr  4 12:29:43 IDT 2024 Cleaning up old docker images, leaving last 2
djmaze commented 7 months ago

I guess this might be connected to the docker version. Can you tell us the version @GuyKh ?

It would be good to know if these commands both return sha-hashed image ids, in your cluster:

docker service inspect general_ouroboros  -f '{{.PreviousSpec.TaskTemplate.ContainerSpec.Image}}'
docker service inspect general_ouroboros  -f '{{.Spec.TaskTemplate.ContainerSpec.Image}}'
GuyKh commented 7 months ago

Last message was:

[Shepherd] Service general_ouroboros updated on 08186ba56442
Thu Apr  4 12:29:43 IDT 2024 Service general_ouroboros was updated from pyouroboros/ouroboros:latest to pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372

Here are my stats:

docker version
Client: Docker Engine - Community
 Version:           26.0.0
 API version:       1.45
 Go version:        go1.21.8
 Git commit:        2ae903e
 Built:             Wed Mar 20 15:17:56 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          26.0.0
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.8
  Git commit:       8b79278
  Built:            Wed Mar 20 15:17:56 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ docker service inspect general_ouroboros  -f '{{.PreviousSpec.TaskTemplate.ContainerSpec.Image}}'
pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372

$docker service inspect general_ouroboros  -f '{{.Spec.TaskTemplate.ContainerSpec.Image}}'
pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372

Looks like all latest messages include sha and not latest. I can try when I'll see such message appearing again, if this would help.

djmaze commented 7 months ago

Mhh, not sure what to make out of this. I have to say I did not test shepherd with docker 26 yet myself.

kb1ibt commented 5 months ago

I was running into this issue on docker 20, but it still exists in docker 26. What I also notice is the sha doesn’t match between what shepherd pulls and what docker stack deploy --prune -c docker-compose.yml --resolve-image always <stack_name> pulls. Because when I first run shepherd it replaces every running container on my swarm, and then when I run docker stack deploy it replaces them all again.

shizunge commented 2 months ago

I am not sure this is the root cause, but I am able to create a service with an image without the digest by doing the following:

  1. Build a new image locally, but not push it to the registry.
  2. Start the service based on the local image.
  3. After service started, push the image to the registry.

From this post: https://stackoverflow.com/questions/39811230/why-doesnt-my-newly-created-docker-have-a-digest

Normally, two scenarios could make an image doesn't have associated manifest:

    This image has not been pushed to or pulled from a V2 registry.
    This image has been pulled from a V1 registry.
shizunge commented 2 months ago

Based on this comment, run docker update for an image that requires login, but without --with-registry-auth, resulting in no digest on image of the service. Then it will update back and forth between two versions.

djmaze commented 1 month ago

So based on @shizunge's comments, this sounds like a docker / usability problem rather than a shepherd bug.