goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
24.29k stars 4.77k forks source link

Retagged images no longer replicating after v2.11.1 upgrade #20897

Open cayla opened 2 months ago

cayla commented 2 months ago

Expected behavior and actual behavior:

Expected behavior: when we push an existing image with a new tag, replication should trigger

Actual behavior: event based replication never occurs for re-tagged images, but manual replication still works.

We have a workflow where we add additional tags to existing images to acknowledge CI completion and our release images.

The former is that we tag images as [normalized branch name]-[git commit hash of HEAD of that branch]-[timestamp of same HEAD commit]-dirty on the initial build.

E.g. foo/app:main-d1028194b3-1725449681-dirty

After CI passes, we drop the -dirty suffix like foo/app:main-d1028194b3-1725449681

(Generally with something like docker buildx imagetools create -t ${image}:${TAG} ${image}:${DIRTY_TAG} in the CI scripts).

We do something similar with our release images. We take a foo/app:main-d1028194b3-1725449681 and retag it v1.2.3

This is our replication configuration:

Screenshot 2024-09-04 at 11 09 12 AM

Since upgrading to 2.11.1 (via your helm release's https://github.com/goharbor/harbor-helm/releases/tag/v1.15.1), we have noticed that no image that is retagged ever triggers replication.

For example, here is this morning's tasks:

Screenshot 2024-09-04 at 11 09 01 AM

We retagged like so:

10:00:28 + docker buildx imagetools create -t harbor.k8s.addgene.org/addgene-core/addgene-core_app:v12.7.0 harbor.k8s.addgene.org/addgene-core/addgene-core_app:main-d1028194b3-1725449681
10:00:28 #1 [internal] pushing harbor.k8s.addgene.org/addgene-core/addgene-core_app:v12.7.0
10:00:28 #1 0.000 pushing sha256:cf575639056dc3ec66e25cf20d1327f5985e1c9d4a6f8d5f079ee262ee581dce to harbor.k8s.addgene.org/addgene-core/addgene-core_app:v12.7.0
10:00:28 #1 DONE 0.2s

As you can see there was no corresponding event_based trigger from this push. (We manually triggered replication at 10:03 and this succeeded. And we have waited much longer than 3 mins in the past -- 30m +. It consistently never fires on its own).

Here is the replication log from that manual trigger:
log.txt

Steps to reproduce the problem:

Versions: Please specify the versions of following systems.

Additional context:

cayla commented 2 months ago

It is speculation, but we wondered if https://github.com/goharbor/harbor/pull/20838/files might be the culprit.

cayla commented 2 months ago

Hm. I don't have a bead on it yet, but there may be another complexity to this issue. I am now finding evidence that foo/app:main-d1028194b3-1725449681-dirty to foo/app:main-d1028194b3-1725449681 is replicating properly and the problem only happens when we try to do foo/app:main-d1028194b3-1725449681 to v1.2.3 Will update as I find more.

log-dirty.txt

12:41:51   + docker buildx imagetools create -t harbor.k8s.addgene.org/addgene-core/addgene-core_app:main-c81346bc89-1725465914 harbor.k8s.addgene.org/addgene-core/addgene-core_app:main-c81346bc89-1725465914-dirty
12:41:51   #1 [internal] pushing harbor.k8s.addgene.org/addgene-core/addgene-core_app:main-c81346bc89-1725465914
12:41:51   #1 0.000 pushing sha256:b3b1b6674773063b64e1d8f2cace51d455198d01d1dc7a8a3e0e5865aceae5f9 to harbor.k8s.addgene.org/addgene-core/addgene-core_app:main-c81346bc89-1725465914
12:41:51   #1 DONE 0.4s

log-not-dirty.txt

Our best speculation right now is that perhaps the first jump ([-dirty] -> [removing the suffix]) works, but for whateve reason any subsequent retags fail to trigger replication.

E.g. main-d1028194b3-1725449681-dirty -> main-d1028194b3-1725449681 is ok but

main-d1028194b3-1725449681 -> vX.Y.Z is failing as well as main-d1028194b3-1725449681-dirty -> main (floating most recent image in main tag)

cayla commented 2 months ago

I have reproduction steps now:

  1. Setup a test instance of harbor with the 2.11.0 release (helm 1.15.0).
  2. Created a helloworld project in harbor and hub.docker.com
  3. Created replication from harbor to hub.
  4. Created a small dockerfile
FROM debian
CMD echo "Helloworld #1"
  1. Ran:
docker build -t helloworld:main-123-456 .
docker tag helloworld:main-123-456 harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456
docker push harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:main  harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:release  harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:v1.2.3  harbor.stg.k8s.addgene.org/helloworld/helloworld:main-123-456

Result:

Four tags in harbor and hub. Expected behavior. Screenshot 2024-09-05 at 9 41 23 AM Screenshot 2024-09-05 at 9 41 25 AM

I then upgraded the harbor instance to 2.11.1 (helm 1.15.1) and repeated the test with minor name variations:

  1. Created a small dockerfile
FROM debian
CMD echo "Helloworld #2"
  1. Ran:
docker build -t helloworld:main-234-567 .
docker tag helloworld:main-234-567 harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567
docker push harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:main  harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:release  harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567
docker buildx imagetools create -t harbor.stg.k8s.addgene.org/helloworld/helloworld:v2.3.4  harbor.stg.k8s.addgene.org/helloworld/helloworld:main-234-567

Result: Four tags in harbor, but only one new / updated tag in hub. Unexpected.

Screenshot 2024-09-05 at 10 09 36 AM Screenshot 2024-09-05 at 10 09 49 AM

Note while main and release are there, I would expect them to be updated to point to the new source image.

Also note the replication log.

Screenshot 2024-09-05 at 10 11 13 AM

helloworld:main-234-567 was pushed at Sep 5, 2024 at 9:48 am and nothing triggered after that.

wy65701436 commented 2 months ago

thanks @cayla for your reporting, I will try to reproduce it at my end and get back to you if any finding.

leonboot commented 2 months ago

I'm experiencing the same issue since the upgrade from 2.11.0 to 2.11.1. It seems @cayla is right about it only happening with images that were retagged.

Would it be safe to downgrade to 2.11.0 for the time being? The migrator image did not report any changes, but I believe that only applies to the configuration, not the database. I'm running into this issue a lot, so a temporary downgrade would make life a lot easier :-)

siegenthalerroger commented 2 months ago

We are also experiencing the same issue. We use the following two tools/variations for tagging the same digest multiple times and both do not trigger the event_based replication. The -dev.1234 tag is replicated correctly, however the other tags are not replicated. A manual replication still functions as expected.

Happy to provide more details as required, however I think this is a pretty clear regression.


CI Build Step:

/kaniko/executor --context $CI_PROJECT_DIR/docker --dockerfile Dockerfile --destination registry.xyz.com/lorem/ipsum:12.0.1-dev.1234 --destination registry.xyz.com/lorem/ipsum:latest

CI Release Step:

crane cp registry.xyz.com/lorem/ipsum:12.0.1-dev.1234 registry.xyz.com/lorem/ipsum:12.0.1

Previously three seperate executions were triggered for a CI release pipeline, however now there's only a single one and the two "retagged" ones are missing.

Previously: image

Curently: image

leonboot commented 2 months ago

@wy65701436 could you by any chance tell if a downgrade to 2.11.0 is possible? I'm using a Docker based installation, so it should be fairly straightforward, as config hasn't changed between 2.11.0 and 2.11.1, but I can't tell whether any database migrations have been performed in that update.

LinuxDoku commented 1 month ago

We have the same issue after upgrading to Harbor 2.11. To mitigate the issue we replaced the retagging in the pipelines with an harbor api call which adds the new tag. This triggers the event based replication just fine.

Another alternative is to trigger the replication itself by a harbor api call as there is no button in the UI to manually start event based replications.

leonboot commented 1 month ago

I had considered the API route as well. But, for triggering replications, an account with administrative privileges is required. Since the API only supports basic authorization that's a no-go, as I'm not storing the password of such an account anywhere.

We've got dozens and dozens of separate replications set up in our Harbor instance. I've been converting them from event based to scheduled on-demand (as in: developer reports a missing image on the upstream repository). I've got loads of replications set up to scheduled 2 minute intervals, which in turn causes the job service logs to rapidly fill up the server's disk space.

If only someone could confirm that downgrading to 2.11.0 is an option...

LinuxDoku commented 1 month ago

You don't need an administrator account "create replication" is enough to trigger replications per api, if this is applicable for your use-case.

grafik

I skimmed the source-code changes between release 2.11.0 and 2.11.1 and I did not find a database migration. So in your case I would take the risk of a downgrade with a proper snapshot of the machine and eventual rollback when something fails. But take my advice with a grain of salt, as I have never worked on the code-base of harbor and don't know if they are validating the database-version against the deployed binaries.

leonboot commented 1 month ago

@LinuxDoku wow... can't believe I've never looked at that. I thought robot accounts were project-only, useful only for pulling and pushing images. Thanks for pointing it out!

siegenthalerroger commented 1 week ago

@wy65701436 any update?

drybalka-s commented 1 week ago

I have the same problem on versions 2.11.1 2.12.0 rollback to 2.11.0 is only possible from 2.11.1

reyvonger commented 6 days ago

I have the same issue - no events are triggered when adding a tag to an existing image in a project.

genestack-okunitsyn commented 4 days ago

The same issue faced today.

wy65701436 commented 3 days ago

@wy65701436 could you by any chance tell if a downgrade to 2.11.0 is possible? I'm using a Docker based installation, so it should be fairly straightforward, as config hasn't changed between 2.11.0 and 2.11.1, but I can't tell whether any database migrations have been performed in that update.

I don't suggest any downgrade, and I am trying to fix this problem and provide patches for the community.