goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
24.08k stars 4.76k forks source link

Harbor's registry writes a blob whose checksum does not match what the system believes its checksum is #20143

Open doctorpangloss opened 7 months ago

doctorpangloss commented 7 months ago

Related to #20133

$ kubectl exec -it harbor-registry-59799688f7-9vq8q -- sha256sum /storage/docker/registry/v2/blobs/sha256/ee/ee16e8d2117a30f83fe374f2c07067494c109eb6e2efdefd62d63ab26e7ac145/data
Defaulted container "registry" out of: registry, registryctl
7bd78c245bdd8c5656693193077e3365fccc4d265aeeb697726bf381e60fde59  /storage/docker/registry/v2/blobs/sha256/ee/ee16e8d2117a30f83fe374f2c07067494c109eb6e2efdefd62d63ab26e7ac145/data

This layer is from

COPY --link --from=build C:/Python311 C:/Python311

It is largish (2.6GB). Not sure under what circumstances this should be occurring. There are no issues with the persistent volume / the underlying storage.

This happens repeatedly when the image is built.

This is a Windows image.

I feel like I am missing something, because I can't see how registry could be so widely used and get so far with this kind of issue.

Expected behavior and actual behavior: When registry interacts with a blob, such as after writing it, it should sha256sum the file to ensure it was written correctly.

Steps to reproduce the problem:

  1. In the first stage, installing a lot of Python packages that create thousands of files.
  2. Copy that Python directory with its site packages to a final stage.
  3. Push the image to Harbor
  4. Observe this issue with layers.

Versions: Please specify the versions of following systems.

Additional context:

There is nothing notable in the logs. It's all just successful pushes.

MinerYang commented 7 months ago

Hi @doctorpangloss ,

could you get the content size of this affected blob layer?

wc -c <  /storage/docker/registry/v2/blobs/sha256/ee/ee16e8d2117a30f83fe374f2c07067494c109eb6e2efdefd62d63ab26e7ac145/data

Describe the manifest info of this image?

cat  /storage/docker/registry/v2/blobs/<xx>/<xx-manifest-digest>

Also preferred to provide the harbor-registry logs both when push and pull this specific image.

doctorpangloss commented 7 months ago

Thank you for further investigating the issue.

$ wc -c <  /storage/docker/registry/v2/blobs/sha256/ee/ee16e8d2117a30f83fe374f2c07067494c109eb6e2efdefd62d63ab26e7ac145/data
3641339259
$ cat /storage/docker/registry/v2/blobs/sha256/62/627b0ceb463ffe633dafb89029b270147b656a76d55cd5b0a72092df2cae28a2/data
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 11348,
    "digest": "sha256:b3fdb2fee5c1acc78cb0f297870cad6adc833c847ea7ce65f72b7cd5b54d8840"
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1388598786,
      "digest": "sha256:7c76e5cf7755ce357ffb737715b0da6799a50ea468cc252c094f4d915d426b3f"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 568860197,
      "digest": "sha256:a61557bf66429be9509f579104808d2853f8f7aefbd49ef26f5f2a90266c46f5"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 21424280,
      "digest": "sha256:5bc010802431ab0ee2b8ef0d775b412b3c56a8eac2428088b0c949817219c295"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 73406695,
      "digest": "sha256:87017d1dc4c5662506aee9340e592e517444e7d9e7485b741fd2c825ebf7bbcf"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 3641333543,
      "digest": "sha256:ceea7cc146971131181173af7bb5432197c486f43e47e14dc24089f95f23f7fa"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 6392387805,
      "digest": "sha256:6a016da1c63b584aea891c77bf01dc8d248da63c5c7350211374969798167da0"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 5597422203,
      "digest": "sha256:be3f48ccfaa562e13a3d19bb4bcea93a575f9e05846b3c0f12dd4ae40ea3fa03"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 4193631836,
      "digest": "sha256:c861bab5f65eaa8e18d08597efcd7901ad18266dfdb053ea39a823cb234d2744"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 7521664388,
      "digest": "sha256:450d94ee1f10724478d0117f89a06b0bb1ede75568f78fb33b790347948bb2e6"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 62127545,
      "digest": "sha256:e4e2d538c233ec1992783294e0bcadd2a510c62f9dcec790e9bac815c34eaad5"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 189818861,
      "digest": "sha256:6bec8920d36011535acb6e334cc03c8165002f8a3e214c4298082abc7c8c9663"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 683934131,
      "digest": "sha256:270fa103a76a3e8407332db541eab8ac9946a62cdfefd13cc765ca9e29919c0f"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 2947552011,
      "digest": "sha256:516646a609227c2e6e4e05cb8fd844e0c6b3145b962efc6b439603ad06913541"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 558733195,
      "digest": "sha256:d73132d27ff250a67d40ed517666e6995c47b6e41bcd753201ae398cb8dd91a8"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 558997790,
      "digest": "sha256:88835ae46eb9809639e615dc1feb6cd92763ddb578229e44cd8b9e607250e2b4"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 552724987,
      "digest": "sha256:83ec24083d455ce8ba98d1228499000bf5d489c98dcd2f8bf3021c20aaf6b601"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 2952371783,
      "digest": "sha256:660ffa7fd6ea9d28302a7f0a98ecf937bcc3849276f87ecf21f75507ffc0a22e"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 780788150,
      "digest": "sha256:3b5f5cc626fa39fd8eaa7f342ab0016e4a7567f846b4a1ef3382f88dd936a616"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 782108407,
      "digest": "sha256:7b6710162dd691f5958da2b1ee94d1d72c9e5608b0fd12dd28941d28c2654228"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 417311,
      "digest": "sha256:2634f7688c381b03a5fe14c998279cde213edd60130c7f2bee2d5fe6da1aed85"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1368,
      "digest": "sha256:152a138552e49987085eeb5e6414af0781bf310ea27e5bd44d082e18b1cc1ba1"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 47774571,
      "digest": "sha256:986501079208002feb43a04c127e225435ffe40d4086f5630d9a12286c4446d3"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 43769789,
      "digest": "sha256:e9993a0af675071811d2a98e7007f89ca8ee0426b958af400e38be14a10994eb"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1856,
      "digest": "sha256:6ee7c2921a2c4134d3b90574334d037dd9eb508aef563adecbffee685a470b7a"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 2679,
      "digest": "sha256:b0119e8423326103aef1d24daa7c3ce07eb567dda0cda5f99e171fad4acc819c"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 2202,
      "digest": "sha256:b891de3b6a71c2470e422ac2570026db36f1e265b8897418b0275a060d7bcc18"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 337073,
      "digest": "sha256:cfe6539fbcedc3a905e2e0c891c1206c4947b4b14f08df65055866bb782be00f"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1556,
      "digest": "sha256:eea6a71e201d02262532740776b4ba8a99d03a564659224278a69aa8abdf6481"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 177482365,
      "digest": "sha256:daeca83d80187859b81c21d2a0e9ee17ae45f79db096aec020b16f3039da136f"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 352796701,
      "digest": "sha256:8a9791962e95f33e683a11fd1438b8e6a13a24c1734abc30040da5dcaf3547b3"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 712071897,
      "digest": "sha256:0f3dc3a10bb747600bfe8269eee7c85fd44536c7826edfdf06fc1d9e3f739c65"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 178359805,
      "digest": "sha256:aa83849f3863a9ce16357323deb12c305c9ebb40e39340b5fa8a81f05bdade00"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 5619685,
      "digest": "sha256:7b6d7f4990df9fb7fdea58e6a95a874d5e72250be20f45201d9ea8b5d5c8f239"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 356075907,
      "digest": "sha256:781a95c1bfdc2835049b5c8054a4b1341b38c6a197c9963d8a3a7176695aaa85"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1470,
      "digest": "sha256:9af0cd1263bfb25877fd238e875cc4b12b56d7bf37099da933a457d8c221b67e"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1314,
      "digest": "sha256:b1b66ff1bed073e4c8c8e1c3034104454c40d86a08efb57626bae7d945252a73"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1340,
      "digest": "sha256:3ef235c23399ac69e0c24121d8b8d93a74b3c13ac724d29d314cecef94328d2f"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 339111,
      "digest": "sha256:c807663cd98abfa55e5b29eb0758bc16057ed80a00ee86caa2acacffbc65394d"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 605862,
      "digest": "sha256:b61ebed909295dca5d1597ab9c0e465e82ecc68dedceddc9468b5649d2323660"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1293,
      "digest": "sha256:96008518d4134a7ea8358f9df2a92c6c496dcc999faa46cff40e3500f7bbb4cc"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 1313,
      "digest": "sha256:ca82bb53477dd0f76ae02489eac98493cdac736295585fb0988aab94bdfc8ac5"
    }
  ]
}

registry_logs.txt These logs are from an isolated instance of registry interacting just with pushing and pulling this image.

I am also going to try Docker 20.x to build the image, because those images, which also contain large content, routinely build successfully.

doctorpangloss commented 7 months ago

After pushing many times it appears kind of random which layer in the final image has an incorrect checksum.

This is pointing to serious bugs in harbor. By using ECR, I've eliminated a lot of other possibilities.

doctorpangloss commented 7 months ago

The manifest doesn't have a matching layer probably because it has already been replaced by another attempt. I don't think it will be super helpful to see that there is a matching layer and that there are no flaws in the manifest. assume that such a thing exists for now.

MinerYang commented 7 months ago

I agree with it likely to be a bug in upstream distribution or docker side has image been corrupted when uploaded.

Would you try to build a similar image without using windows os based image? Or build a windows os based image without large size content?

teimyBr commented 7 months ago

https://github.com/redis/redis/issues/13156

Where are also facing this issue with docker.io proxy in our harbor with redis image

MinerYang commented 7 months ago

Also could you build a distribution registry to try push and pull for verification? https://github.com/distribution/distribution/blob/v2.8.3/BUILDING.md

teimyBr commented 7 months ago

How to reproduce the issue:

pull proxy/docker.io/library/redis:7.2.4-alpine over a docker.io proxy in harbor Run this image in conatainerD Runtime.


We have also deleted this redis:7.2.4-alpine image and run a gc. And let i pulled it new from docker.io Issue is still there


Seems like this SHA (multi arch cha) redis:7.2.4-alpine@sha256:641c365890fc79f182fb198c80ee807d040b9cfdb19cceb7f10c55a268d212b8 Has Issue over a Harbor Docker.io Proxy

The amd64 sha works fine

redis:7.2.4-alpine@sha256:3487aa5cf06dceb38202b06bba45b6e6d8a92288848698a6518eee5f63a293a3

MinerYang commented 7 months ago

@doctorpangloss is your project also set up as proxy cache?

doctorpangloss commented 7 months ago

@doctorpangloss is your project also set up as proxy cache?

No.

doctorpangloss commented 7 months ago

When pushed and pulled from ECR, everything works.

doctorpangloss commented 7 months ago

Is there something in harbor that causes it to not write the layer exactly as it was uploaded?

If so, how do I disable this?

doctorpangloss commented 7 months ago

Thanks again for investigating this with me.

Would you try to build a similar image without using windows os based image? Or build a windows os based image without large size content?

@MinerYang this issue reproduces with a Linux version of the image with no build stages and no large files.

Is there anything that causes harbor to touch the contents of what it is writing to the registry?

An Ingress configuration is possible but I routinely push other images. Something essential about this is that it is installing Python packages, which have many files and create many links.

FROM nvcr.io/nvidia/pytorch:24.01-py3 as builder
ARG PIP_DISABLE_PIP_VERSION_CHECK=1
ARG PIP_NO_CACHE_DIR=1
RUN pip install wheel && \
    pip install --no-build-isolation git+https://github.com/hiddenswitch/ComfyUI.git

WORKDIR /workspace

RUN comfyui --quick-test-for-ci --cpu --cwd /workspace
EXPOSE 8188
CMD ["comfyui", "--listen", "--cwd", "/workspace"]

some Python packages have been removed from this Dockerfile. I've had this pushed just fine a few weeks ago. Between the not working and working versions, I upgraded to harbor 2.10. So it's most likely to be a new bug. The urgency is gone because I am using ECR for now, but I think this is a serious new bug.

doctorpangloss commented 7 months ago

Also could you build a distribution registry to try push and pull for verification? https://github.com/distribution/distribution/blob/v2.8.3/BUILDING.md

are you saying I should push to vanilla Docker registry? I will be honest, it's going to work fine. Would it be helpful to see if it resolves the issue by patching it into Harbor instead? Is its image a drop-in replacement for the image of the harbor-registry deployment? It would seem so.

MinerYang commented 7 months ago

An Ingress configuration is possible but I routinely push other images. Something essential about this is that it is installing Python packages, which have many files and create many links.

FROM nvcr.io/nvidia/pytorch:24.01-py3 as builder
ARG PIP_DISABLE_PIP_VERSION_CHECK=1
ARG PIP_NO_CACHE_DIR=1
RUN pip install wheel && \
    pip install --no-build-isolation git+https://github.com/hiddenswitch/ComfyUI.git

WORKDIR /workspace

RUN comfyui --quick-test-for-ci --cpu --cwd /workspace
EXPOSE 8188
CMD ["comfyui", "--listen", "--cwd", "/workspace"]

some Python packages have been removed from this Dockerfile. I've had this pushed just fine a few weeks ago. Between the not working and working versions, I upgraded to harbor 2.10. So it's most likely to be a new bug. The urgency is gone because I am using ECR for now, but I think this is a serious new bug.

HI @doctorpangloss , Thanks for provide all these feedback. Although I take a try with the exact same dockerfile and using docker buildx build and push multi-arch images to harbor v2.10.0 instance. Pulling success with both digest and tag. BTW I am just using docker compose installation.

Screenshot 2024-03-22 at 17 21 12
MinerYang commented 7 months ago

redis/redis#13156

Where are also facing this issue with docker.io proxy in our harbor with redis image

Hi @teimyBr ,

Thanks for connecting with us and appreciate if you could help to file a specific issue of your proxy-cache problem? including logs , harbor version etc..

teimyBr commented 7 months ago

redis/redis#13156 Where are also facing this issue with docker.io proxy in our harbor with redis image

Hi @teimyBr ,

Thanks for connecting with us and appreciate if you could help to file a specific issue of your proxy-cache problem? including logs , harbor version etc..

i think the issue was in redis image the new cha from redis fixed the issue

doctorpangloss commented 7 months ago

An Ingress configuration is possible but I routinely push other images. Something essential about this is that it is installing Python packages, which have many files and create many links.

FROM nvcr.io/nvidia/pytorch:24.01-py3 as builder
ARG PIP_DISABLE_PIP_VERSION_CHECK=1
ARG PIP_NO_CACHE_DIR=1
RUN pip install wheel && \
    pip install --no-build-isolation git+https://github.com/hiddenswitch/ComfyUI.git

WORKDIR /workspace

RUN comfyui --quick-test-for-ci --cpu --cwd /workspace
EXPOSE 8188
CMD ["comfyui", "--listen", "--cwd", "/workspace"]

some Python packages have been removed from this Dockerfile. I've had this pushed just fine a few weeks ago. Between the not working and working versions, I upgraded to harbor 2.10. So it's most likely to be a new bug. The urgency is gone because I am using ECR for now, but I think this is a serious new bug.

HI @doctorpangloss , Thanks for provide all these feedback. Although I take a try with the exact same dockerfile and using docker buildx build and push multi-arch images to harbor v2.10.0 instance. Pulling success with both digest and tag. BTW I am just using docker compose installation.

Screenshot 2024-03-22 at 17 21 12

then something is corrupting the layers on write. Very mysterious. I will try on a fresh harbor installation.

MinerYang commented 7 months ago

https://github.com/goharbor/harbor/issues/20133#issuecomment-2019519579

doctorpangloss commented 6 months ago

I ran out of bandwidth for this issue. I kind of need to know if harbor just writes the thing it receives. In other words, when a layer is pushed, is that content written directly, or is it read, modified, unpacked, etc. before it is written? For example, by some kind of layer scanning process? Then I would like to disable that process.

MinerYang commented 6 months ago

Hi @doctorpangloss , Harbor does not have any overwritten process for upload blob content, it will proxy to upstream distribution directly, i.e for content written to filesystem we have same behavior as distribution/distribution.

github-actions[bot] commented 4 months ago

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

doctorpangloss commented 4 months ago

Hi @doctorpangloss , Harbor does not have any overwritten process for upload blob content, it will proxy to upstream distribution directly, i.e for content written to filesystem we have same behavior as distribution/distribution.

but does harbor patch distribution/distribution? it seems like my issue is really a distribution bug, and it does look really buggy., is there an alternative? aren't there pre-existing systems that support transactions for database-like and file-like operations together? what is AWS using internally for ECR?

github-actions[bot] commented 2 months ago

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

doctorpangloss commented 1 month ago

This may be related https://github.com/microsoft/Windows-Containers/issues/519

I saw this occur once in ECR!