Open cayla opened 2 months ago
It is speculation, but we wondered if https://github.com/goharbor/harbor/pull/20838/files might be the culprit.
Hm. I don't have a bead on it yet, but there may be another complexity to this issue. I am now finding evidence that foo/app:main-d1028194b3-1725449681-dirty
to foo/app:main-d1028194b3-1725449681
is replicating properly and the problem only happens when we try to do foo/app:main-d1028194b3-1725449681
to v1.2.3
Will update as I find more.
12:41:51 + docker buildx imagetools create -t harbor.k8s.addgene.org/addgene-core/addgene-core_app:main-c81346bc89-1725465914 harbor.k8s.addgene.org/addgene-core/addgene-core_app:main-c81346bc89-1725465914-dirty
12:41:51 #1 [internal] pushing harbor.k8s.addgene.org/addgene-core/addgene-core_app:main-c81346bc89-1725465914
12:41:51 #1 0.000 pushing sha256:b3b1b6674773063b64e1d8f2cace51d455198d01d1dc7a8a3e0e5865aceae5f9 to harbor.k8s.addgene.org/addgene-core/addgene-core_app:main-c81346bc89-1725465914
12:41:51 #1 DONE 0.4s
Our best speculation right now is that perhaps the first jump ([-dirty]
-> [removing the suffix]) works, but for whateve reason any subsequent retags fail to trigger replication.
E.g. main-d1028194b3-1725449681-dirty
-> main-d1028194b3-1725449681
is ok but
main-d1028194b3-1725449681
-> vX.Y.Z
is failing as well as
main-d1028194b3-1725449681-dirty
-> main
(floating most recent image in main tag)
I have reproduction steps now:
helloworld
project in harbor and hub.docker.comFROM debian
CMD echo "Helloworld #1"
docker build -t helloworld:main-123-456 .
docker tag helloworld:main-123-456 harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456
docker push harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:main harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:release harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:v1.2.3 harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456
Result:
Four tags in harbor and hub. Expected behavior.
I then upgraded the harbor instance to 2.11.1 (helm 1.15.1) and repeated the test with minor name variations:
FROM debian
CMD echo "Helloworld #2"
docker build -t helloworld:main-234-567 .
docker tag helloworld:main-234-567 harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567
docker push harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:main harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:release harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:v2.3.4 harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567
Result: Four tags in harbor, but only one new / updated tag in hub. Unexpected.
Note while main
and release
are there, I would expect them to be updated to point to the new source image.
Also note the replication log.
helloworld:main-234-567
was pushed at Sep 5, 2024 at 9:48 am and nothing triggered after that.
thanks @cayla for your reporting, I will try to reproduce it at my end and get back to you if any finding.
I'm experiencing the same issue since the upgrade from 2.11.0 to 2.11.1. It seems @cayla is right about it only happening with images that were retagged.
Would it be safe to downgrade to 2.11.0 for the time being? The migrator image did not report any changes, but I believe that only applies to the configuration, not the database. I'm running into this issue a lot, so a temporary downgrade would make life a lot easier :-)
We are also experiencing the same issue. We use the following two tools/variations for tagging the same digest multiple times and both do not trigger the event_based replication. The -dev.1234
tag is replicated correctly, however the other tags are not replicated. A manual replication still functions as expected.
Happy to provide more details as required, however I think this is a pretty clear regression.
CI Build Step:
/kaniko/executor --context $CI_PROJECT_DIR/docker --dockerfile Dockerfile --destination registry.xyz.com/lorem/ipsum:12.0.1-dev.1234 --destination registry.xyz.com/lorem/ipsum:latest
CI Release Step:
crane cp registry.xyz.com/lorem/ipsum:12.0.1-dev.1234 registry.xyz.com/lorem/ipsum:12.0.1
Previously three seperate executions were triggered for a CI release pipeline, however now there's only a single one and the two "retagged" ones are missing.
Previously:
Curently:
@wy65701436 could you by any chance tell if a downgrade to 2.11.0 is possible? I'm using a Docker based installation, so it should be fairly straightforward, as config hasn't changed between 2.11.0 and 2.11.1, but I can't tell whether any database migrations have been performed in that update.
We have the same issue after upgrading to Harbor 2.11. To mitigate the issue we replaced the retagging in the pipelines with an harbor api call which adds the new tag. This triggers the event based replication just fine.
Another alternative is to trigger the replication itself by a harbor api call as there is no button in the UI to manually start event based replications.
I had considered the API route as well. But, for triggering replications, an account with administrative privileges is required. Since the API only supports basic authorization that's a no-go, as I'm not storing the password of such an account anywhere.
We've got dozens and dozens of separate replications set up in our Harbor instance. I've been converting them from event based to scheduled on-demand (as in: developer reports a missing image on the upstream repository). I've got loads of replications set up to scheduled 2 minute intervals, which in turn causes the job service logs to rapidly fill up the server's disk space.
If only someone could confirm that downgrading to 2.11.0 is an option...
You don't need an administrator account "create replication" is enough to trigger replications per api, if this is applicable for your use-case.
I skimmed the source-code changes between release 2.11.0 and 2.11.1 and I did not find a database migration. So in your case I would take the risk of a downgrade with a proper snapshot of the machine and eventual rollback when something fails. But take my advice with a grain of salt, as I have never worked on the code-base of harbor and don't know if they are validating the database-version against the deployed binaries.
@LinuxDoku wow... can't believe I've never looked at that. I thought robot accounts were project-only, useful only for pulling and pushing images. Thanks for pointing it out!
@wy65701436 any update?
I have the same problem on versions 2.11.1 2.12.0 rollback to 2.11.0 is only possible from 2.11.1
I have the same issue - no events are triggered when adding a tag to an existing image in a project.
The same issue faced today.
@wy65701436 could you by any chance tell if a downgrade to 2.11.0 is possible? I'm using a Docker based installation, so it should be fairly straightforward, as config hasn't changed between 2.11.0 and 2.11.1, but I can't tell whether any database migrations have been performed in that update.
I don't suggest any downgrade, and I am trying to fix this problem and provide patches for the community.
Expected behavior and actual behavior:
Expected behavior: when we push an existing image with a new tag, replication should trigger
Actual behavior: event based replication never occurs for re-tagged images, but manual replication still works.
We have a workflow where we add additional tags to existing images to acknowledge CI completion and our release images.
The former is that we tag images as
[normalized branch name]-[git commit hash of HEAD of that branch]-[timestamp of same HEAD commit]-dirty
on the initial build.E.g.
foo/app:main-d1028194b3-1725449681-dirty
After CI passes, we drop the
-dirty
suffix likefoo/app:main-d1028194b3-1725449681
(Generally with something like
docker buildx imagetools create -t ${image}:${TAG} ${image}:${DIRTY_TAG}
in the CI scripts).We do something similar with our release images. We take a
foo/app:main-d1028194b3-1725449681
and retag itv1.2.3
This is our replication configuration:
Since upgrading to 2.11.1 (via your helm release's https://github.com/goharbor/harbor-helm/releases/tag/v1.15.1), we have noticed that no image that is retagged ever triggers replication.
For example, here is this morning's tasks:
We retagged like so:
As you can see there was no corresponding
event_based
trigger from this push. (We manually triggered replication at 10:03 and this succeeded. And we have waited much longer than 3 mins in the past -- 30m +. It consistently never fires on its own).Here is the replication log from that manual trigger:
log.txt
Steps to reproduce the problem:
Versions: Please specify the versions of following systems.
Additional context: