Open GuyKh opened 1 year ago
may i ask what you are using for notifications?
may i ask what you are using for notifications?
Telegram
@GuyKh Could you please verify whether this issue still persists with the latest shepherd version? If yes, please run shepherd with VERBOSE=true
and share the corresponding log file with us.
Make sure to update your image specifier to containrrr/shepherd
.
@moschlar latest update occurred 2 months ago, so is your question relates to the last 2 months - in this case - the answer is definitely YES
I can tell that since Jan 17th, things have been quiet on this front
It's here again :)
general_shepherd.1.45n0qi5aws3p@nuc | Wed Jan 31 15:16:41 IST 2024 Sleeping 60m before next update
general_shepherd.1.45n0qi5aws3p@nuc | Wed Jan 31 16:16:54 IST 2024 Trying to update service general_adguard-exporter with image ebrianne/adguard-exporter:latest
general_shepherd.1.45n0qi5aws3p@nuc | Wed Jan 31 16:17:07 IST 2024 Service general_adguard-exporter was updated!
general_shepherd.1.45n0qi5aws3p@nuc | % Total % Received % Xferd Average Speed Time Time Time Current
general_shepherd.1.45n0qi5aws3p@nuc | Dload Upload Total Spent Left Speed
100 313 100 2 100 311 3 512 --:--:-- --:--:-- --:--:-- 514
general_shepherd.1.45n0qi5aws3p@nuc | okWed Jan 31 16:17:08 IST 2024 Cleaning up old docker images, leaving last 5
general_shepherd.1.45n0qi5aws3p@nuc | no such manifest: docker.io/tiredofit/traefik-cloudflare-companion:latest
general_shepherd.1.45n0qi5aws3p@nuc | Wed Jan 31 16:17:12 IST 2024 Error updating service general_cf-companion! Image tiredofit/traefik-cloudflare-companion:latest does not exist or it is not available
general_shepherd.1.45n0qi5aws3p@nuc | Wed Jan 31 16:17:18 IST 2024 Trying to update service general_cf-ddns with image oznu/cloudflare-ddns:latest
general_shepherd.1.45n0qi5aws3p@nuc | Wed Jan 31 16:17:33 IST 2024 Service general_cf-ddns was updated!
general_shepherd.1.45n0qi5aws3p@nuc | % Total % Received % Xferd Average Speed Time Time Time Current
general_shepherd.1.45n0qi5aws3p@nuc | Dload Upload Total Spent Left Speed
100 285 100 2 100 283 5 778 --:--:-- --:--:-- --:--:-- 785
general_shepherd.1.45n0qi5aws3p@nuc | okWed Jan 31 16:17:34 IST 2024 Cleaning up old docker images, leaving last 5
general_shepherd.1.45n0qi5aws3p@nuc | no such manifest: docker.io/library/docker:latest
general_shepherd.1.45n0qi5aws3p@nuc | Wed Jan 31 16:17:38 IST 2024 Error updating service general_image-prune! Image docker:latest does not exist or it is not available
general_shepherd.1.45n0qi5aws3p@nuc | Wed Jan 31 16:17:44 IST 2024 Trying to update service general_notify with image mazzolino/apprise-microservice:0.1
general_shepherd.1.45n0qi5aws3p@nuc | Wed Jan 31 16:18:06 IST 2024 Service general_notify was updated!
general_shepherd.1.45n0qi5aws3p@nuc | % Total % Received % Xferd Average Speed Time Time Time Current
general_shepherd.1.45n0qi5aws3p@nuc | Dload Upload Total Spent Left Speed
100 297 100 2 100 295 4 682 --:--:-- --:--:-- --:--:-- 687
general_shepherd.1.45n0qi5aws3p@nuc | okWed Jan 31 16:18:07 IST 2024 Cleaning up old docker images, leaving last 5
From that log and screenshot I can't see that any service is "flapping"...
Can you share your Docker Swarm stack files?
Looking at the swarm files - I was using mazzolino/shepherd
and not container
-- so seeing:
Image mazzolino/shepherd:latest does not exist or it is not available
Retrying this with containrrr/shepherd
Well... this still occurs:
general_shepherd.1.5w81t3foj5bk@nuc | okThu Feb 1 12:25:00 IST 2024 Cleaning up old docker images, leaving last 3
general_shepherd.1.5w81t3foj5bk@nuc | Thu Feb 1 12:25:06 IST 2024 Trying to update service general_cf-ddns with image oznu/cloudflare-ddns:latest
general_shepherd.1.5w81t3foj5bk@nuc | Thu Feb 1 12:25:22 IST 2024 Service general_cf-ddns was updated!
general_shepherd.1.5w81t3foj5bk@nuc | % Total % Received % Xferd Average Speed Time Time Time Current
general_shepherd.1.5w81t3foj5bk@nuc | Dload Upload Total Spent Left Speed
100 304 100 2 100 302 6 936 --:--:-- --:--:-- --:--:-- 944
general_shepherd.1.5w81t3foj5bk@nuc | okThu Feb 1 12:25:22 IST 2024 Cleaning up old docker images, leaving last 3
The updating image of someimagename:sometag@shanum
should be correct. As images with the same tag identified with their sha sum. That is what determines last year's "latest" with today's "latest".
Sample of mine update log
service nextcloud_imaginary was updated from nextcloud/aio-imaginary:latest@sha256:f7fb3f35cdbacbaa06dbcf6bbc567e39037af1251fb3600b44c8626e3bbf0b01 to nextcloud/aio-imaginary:latest@sha256:3d1cb04f90eca6dbbaaed0f773ed092a024b0eca742b73f88f8b010025d3ab9b
What I can't really tell is why it does non-sha number
. Like your first post:
[Shepherd] Service general_notify updated on 865954dc3903
Sat Sep 2 06:23:14 IDT 2023 Service general_notify was updated from mazzolino/apprise-microservice:0.1@sha256:3abc60085e429a51455f2e4dee656cbb96b20f4a76f2510c1a75c6e24cd0193c to mazzolino/apprise-microservice:0.1
From sha to non-sha. Is it because same sha? I'm not sure.
I'm really haven't looked in, but I think the issue is the resolving of :latest
- what's the logic between resolving it to a specific sha and between the one to keep it 'latest`...
Like I said yesterday, from your recent reports, I can not see the issue that you have been describing in your first post and the title of this issue.
Please try to reproduce this with the latest official image and show us the logging output.
I'm experiencing this issue as well. Here are the messages from Apprise:
[Shepherd] Service backend_file-proxy updated on 5d4000a267bd
Wed Apr 3 02:49:03 CEST 2024 Service backend_file-proxy was updated from [ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342](http://ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342) to [ghcr.io/REDACTED/file-proxy:1.0](http://ghcr.io/REDACTED/file-proxy:1.0)
AppriseApprise | Today at 2:49 AM
[Shepherd] Service backend_file-proxy updated on 5d4000a267bd
Wed Apr 3 03:00:26 CEST 2024 Service backend_file-proxy was updated from [ghcr.io/REDACTED/file-proxy:1.0](http://ghcr.io/REDACTED/file-proxy:1.0) to [ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342](http://ghcr.io/REDACTED/file-proxy:1.0@sha256:d9a3268c22892f7272773cd9b6caabe3630c6877f8b28373f7dbb57822f6d342)
AppriseApprise | Today at 3:00 AM
We're experiencing this service using either Docker Hub or GitHub container registry, so the problem is probably not the registries. Here are the logs from shepherd (verbose):
Wed Apr 3 02:48:57 CEST 2024 Trying to update service backend_file-proxy with image ghcr.io/REDACTED/file-proxy:1.0
image ghcr.io/REDACTED/file-proxy:1.0 could not be accessed on a registry to record
its digest. Each node will access ghcr.io/REDACTED/file-proxy:1.0 independently,
possibly leading to different nodes running different
versions of the image.
Wed Apr 3 02:49:03 CEST 2024 Service backend_file-proxy was updated!
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 323 100 2 100 321 9 1446 --:--:-- --:--:-- --:--:-- 1461
Wed Apr 3 03:00:25 CEST 2024 Trying to update service backend_file-proxy with image ghcr.io/REDACTED/file-proxy:1.0
Wed Apr 3 03:00:26 CEST 2024 Service backend_file-proxy was updated!
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 323 100 2 100 321 10 1664 --:--:-- --:--:-- --:--:-- 1682
The example has a defined version tag, but this is also happening to services using images with a latest tag. Here is our YAML file for the service:
shepherd:
image: containrrr/shepherd
environment:
SLEEP_TIME: '5m'
FILTER_SERVICES: 'label=shepherd.autodeploy'
ROLLBACK_ON_FAILURE: 'true'
REGISTRIES_FILE: /var/run/secrets/shepherd-registries-auth
WITH_REGISTRY_AUTH: 'true'
APPRISE_SIDECAR_URL: 'notify:5000'
TZ: Europe/Berlin
secrets:
- shepherd-registries-auth
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
- notification
deploy:
placement:
constraints:
- node.role == manager
btw. thanks for a great service. Shepherd has really improved our deployment strategy.
Like I said yesterday, from your recent reports, I can not see the issue that you have been describing in your first post and the title of this issue.
Please try to reproduce this with the latest official image and show us the logging output.
Just reproduced this with latest version: Getting this:
[Shepherd] Service general_ouroboros updated on 08186ba56442
Thu Apr 4 12:29:43 IDT 2024 Service general_ouroboros was updated from pyouroboros/ouroboros:latest to pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372
Logs:
general_shepherd.1.xcjzlmsdmhry@ubuntu | Thu Apr 4 12:29:20 IDT 2024 Trying to update service general_ouroboros with image pyouroboros/ouroboros:latest
general_shepherd.1.xcjzlmsdmhry@ubuntu | Thu Apr 4 12:29:43 IDT 2024 Service general_ouroboros was updated!
general_shepherd.1.xcjzlmsdmhry@ubuntu | % Total % Received % Xferd Average Speed Time Time Time Current
general_shepherd.1.xcjzlmsdmhry@ubuntu | Dload Upload Total Spent Left Speed
100 310 100 2 100 308 5 783 --:--:-- --:--:-- --:--:-- 788
general_shepherd.1.xcjzlmsdmhry@ubuntu | okThu Apr 4 12:29:43 IDT 2024 Cleaning up old docker images, leaving last 2
I guess this might be connected to the docker version. Can you tell us the version @GuyKh ?
It would be good to know if these commands both return sha-hashed image ids, in your cluster:
docker service inspect general_ouroboros -f '{{.PreviousSpec.TaskTemplate.ContainerSpec.Image}}'
docker service inspect general_ouroboros -f '{{.Spec.TaskTemplate.ContainerSpec.Image}}'
Last message was:
[Shepherd] Service general_ouroboros updated on 08186ba56442
Thu Apr 4 12:29:43 IDT 2024 Service general_ouroboros was updated from pyouroboros/ouroboros:latest to pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372
Here are my stats:
docker version
Client: Docker Engine - Community
Version: 26.0.0
API version: 1.45
Go version: go1.21.8
Git commit: 2ae903e
Built: Wed Mar 20 15:17:56 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 26.0.0
API version: 1.45 (minimum version 1.24)
Go version: go1.21.8
Git commit: 8b79278
Built: Wed Mar 20 15:17:56 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.28
GitCommit: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ docker service inspect general_ouroboros -f '{{.PreviousSpec.TaskTemplate.ContainerSpec.Image}}'
pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372
$docker service inspect general_ouroboros -f '{{.Spec.TaskTemplate.ContainerSpec.Image}}'
pyouroboros/ouroboros:latest@sha256:cfa29916459fb8c578fce084ce839a0d3bee478b83a21b6b1d10c6b78bc4a372
Looks like all latest messages include sha
and not latest
.
I can try when I'll see such message appearing again, if this would help.
Mhh, not sure what to make out of this. I have to say I did not test shepherd with docker 26 yet myself.
I was running into this issue on docker 20, but it still exists in docker 26. What I also notice is the sha doesn’t match between what shepherd pulls and what docker stack deploy --prune -c docker-compose.yml --resolve-image always <stack_name>
pulls. Because when I first run shepherd it replaces every running container on my swarm, and then when I run docker stack deploy
it replaces them all again.
I am not sure this is the root cause, but I am able to create a service with an image without the digest by doing the following:
From this post: https://stackoverflow.com/questions/39811230/why-doesnt-my-newly-created-docker-have-a-digest
Normally, two scenarios could make an image doesn't have associated manifest:
This image has not been pushed to or pulled from a V2 registry.
This image has been pulled from a V1 registry.
Based on this comment, run docker update
for an image that requires login, but without --with-registry-auth
, resulting in no digest on image of the service. Then it will update back and forth between two versions.
So based on @shizunge's comments, this sounds like a docker / usability problem rather than a shepherd bug.
See pic:
Very often I'm getting two updates, one from latest (for example, or a versioned image) to a version with sha and then back to non-sha one.
e.g.