500 error from /v2/repositories/<name>/tags : "failed to update 1 tag(s) and first error msg: updating images in DB failed ..."

Hi,

I could not find anything searching for this specific error message detailed below.

Our CD pipeline uploads containers and deletes old tags; the actual code is https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/promote-docker-image/tasks/promote-cleanup.yaml#L1 and to be specific, the call we have seen failing is the request below

- name: List tags
  uri:
    url: "https://hub.docker.com/v2/repositories/{{ zj_image.repository }}/tags?page_size=1000"
    status_code: 200
  register: tags

Recently, on change https://review.opendev.org/c/opendev/system-config/+/806448 our CD was "promoting" (i.e. tagging containers previously uploaded and verified as the latest and cleaning up old tags) a number of new containers. We saw 500 errors from this endpoint a number of times during this process on three different jobs:

https://zuul.opendev.org/t/openstack/build/6909d950f40f4ab793f6b46d9c314985 @ task failure

2021-08-29 01:49:23.426087 "message": "failed to update 2 tag(s) and first error msg: updating images in DB failed namespace=opendevorg repository=python-builder tag=change_806448_3.8-bullseye: getRepoTagID in UpdateRepositoryTagImages query error namespace=opendevorg repository=python-builder tag=change_806448_3.8-bullseye: dbr: not found"
...
2021-08-29 01:49:23.426679  "url": "https://hub.docker.com/v2/repositories/opendevorg/python-builder/tags?page_size=1000",

https://zuul.opendev.org/t/openstack/build/cc453830c96a4dd98f3b3ecb4db9e026 @ task failure

2021-08-29 01:49:25.678917 "message": "failed to update 1 tag(s) and first error msg: updating images in DB failed repository=uwsgi-base tag=change_806448_3.7-bullseye namespace=opendevorg: getRepoTagID in UpdateRepositoryTagImages query error namespace=opendevorg repository=uwsgi-base tag=change_806448_3.7-bullseye: dbr: not found"
...
2021-08-29 01:49:25.679521 "url": "https://hub.docker.com/v2/repositories/opendevorg/uwsgi-base/tags?page_size=1000",

https://zuul.opendev.org/t/openstack/build/a18b34c5019c4831a9ac363cf710ee87 @ task failure

2021-08-29 01:49:23.495635 "message": "failed to update 1 tag(s) and first error msg: updating images in DB failed tag=change_806448_3.9 namespace=opendevorg repository=python-base: getRepoTagID in UpdateRepositoryTagImages query error namespace=opendevorg repository=python-base tag=change_806448_3.9: dbr: not found"
...
2021-08-29 01:49:23.496220 "url": "https://hub.docker.com/v2/repositories/opendevorg/python-base/tags?page_size=1000",

Note all these were happening in parallel on different hosts. Timestamps are in UTC.

The full collection of attempts is at https://zuul.opendev.org/t/openstack/buildset/e9aa11951b2a4da49b45e7da3df7a811. We have converted our python-base-3.X image into separate python-base-3.X-<buster|bullseye> images and you can see each job "promoting" it's image. We were running this same process for all these images and only these 3 returned the 500 error. So it feels like a race condition.

I believe we may have seen this before; although I haven't chased down logs to find examples. We can backoff and retry on 500 errors but I'm sure there is some better solution. Pretty much all the client-side logs are available in the links above, but please let us know if there is anything else we could do to help diagnose this issue.

docker / hub-feedback

500 error from /v2/repositories/<name>/tags : "failed to update 1 tag(s) and first error msg: updating images in DB failed ..." #2143