gitpod-io / gitpod

The developer platform for on-demand cloud development environments to create software faster and more securely.
https://www.gitpod.io
GNU Affero General Public License v3.0
12.96k stars 1.24k forks source link

[image-builder] Some private registries do not work and fail with unauthorized access and bad gateway #7264

Closed yjwong closed 1 year ago

yjwong commented 2 years ago

UPDATE

Refer to the comments on this issue for latest findings. This issue had an incorrect title and description which we realised on investigating the issue.

Bug description

When building an image as part of creating a new workspace, the build fails with the following:

#5 pushing layers
#5 8.286 error: unexpected status: 502 Bad Gateway
#5 8.286 retrying in 1s
#5 9.398 error: unexpected status: 502 Bad Gateway
#5 9.398 retrying in 1s
#5 17.46 error: unexpected status: 502 Bad Gateway
#5 17.46 retrying in 2s
#5 22.67 error: unexpected status: 502 Bad Gateway
#5 22.67 retrying in 2s
#5 27.66 error: unexpected status: 502 Bad Gateway
#5 27.66 retrying in 4s
#5 32.93 error: unexpected status: 502 Bad Gateway
#5 32.93 retrying in 4s
#5 pushing layers 39.8s done
#5 39.85 error: unexpected status: 502 Bad Gateway
#5 ERROR: unexpected status: 502 Bad Gateway
------
 > exporting to image:
#5 9.398 retrying in 1s
#5 17.46 error: unexpected status: 502 Bad Gateway
#5 17.46 retrying in 2s
#5 22.67 error: unexpected status: 502 Bad Gateway
#5 22.67 retrying in 2s
#5 27.66 error: unexpected status: 502 Bad Gateway
#5 27.66 retrying in 4s
#5 32.93 error: unexpected status: 502 Bad Gateway
#5 32.93 retrying in 4s
#5 39.85 error: unexpected status: 502 Bad Gateway
------
error: failed to solve: unexpected status: 502 Bad Gateway
{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","command":"build","error":"exit status 1","level":"error","message":"build failed","serviceContext":{"service":"bob","version":""},"severity":"ERROR","time":"2021-12-16T08:39:00Z"}
exit
exit

Error: headless task failed: exit status 1

Upon investigation, the build fails because of this:

2021/12/16 08:38:58 [DEBUG] POST https://external.private.registry/v2/gitpod/workspace-images/blobs/uploads/?mount=sha256:a8a7c4e7d4be11b2eb998377b9c8de9cb0f23b3f54a46cf1d5833f245257bcd7&from=base (status: 401): retrying in 1s (3 left)
2021/12/16 08:38:59 [DEBUG] POST https://external.private.registry/v2/gitpod/workspace-images/blobs/uploads/?mount=sha256:a8a7c4e7d4be11b2eb998377b9c8de9cb0f23b3f54a46cf1d5833f245257bcd7&from=base (status: 401): retrying in 2s (2 left)
2021/12/16 08:39:00 http: proxy error: POST https://external.private.registry/v2/gitpod/workspace-images/blobs/uploads/?mount=sha256:8e44a974834fc9d577f867a6961829d0d3b5fd7fc30bda3babfba2dd26465155&from=base giving up after 4 attempt(s)
2021/12/16 08:39:00 http: proxy error: context canceled

The upload of other layers work fine, but when a cross-repository blob mount is requested, 401 is returned from the private registry.

This appears to be because the from parameter is incorrect - it should name the other repository the client has access to, which in this cause should have been gitpod/base-images instead of base.

Steps to reproduce

Configure image builder with the following configuration:

components:
  imageBuilder:
    registry:
      name: external.private.registry
      secretName: registry-secret

The external.private.registry in this case refers to a GitLab container registry.

Start a workspace on an empty Git repository.

Create a custom .gitpod.yml with the following:

image:
  file: .gitpod.Dockerfile

Sample simple .gitpod.Dockerfile:

FROM gitpod/workspace-full:latest

RUN sudo apt-get update -y && \
    sudo apt-get upgrade -y && \
    sudo rm -rf /var/lib/apt/lists/*

Then restart the workspace.

Workspace affected

No response

Expected behavior

There should be no issues pushing to the private registry.

Example repository

I can provide access to the Kubernetes cluster hosting Gitpod. I am available on Discord under the same username.

Anything else?

This is a self-hosted install based on main.2034 (commit b2969de300c4235b8d640d501246d1553fac92b9).

corneliusludmann commented 2 years ago

Hey team @gitpod-io/engineering-workspace! Could you have a look at this bug report? Let me know if you need help from the self-hosted team to reproduce this.

csweichel commented 2 years ago

Thanks for the detailed write-up. As you indicated that's most likely due to an omission in the "bob proxy".

princerachit commented 2 years ago

On testing with core-dev I see the following logs in the image-builder pod.

Image builder logs

gitpod /workspace/gitpod (prs/x-repo) $ kubectl logs -f imagebuild-fa8b9a77-6ee8-47ba-a30e-b1a7252d949e
{"level":"info","message":"connected to parent socket","ring":2,"serviceContext":{"service":"workspacekit","version":"commit-59a5f028c507144b9cac1477e9c4de31bc6d8849"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"level":"info","message":"signaling to child process","ring":1,"serviceContext":{"service":"workspacekit","version":"commit-59a5f028c507144b9cac1477e9c4de31bc6d8849"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"level":"info","message":"awaiting seccomp fd","ring":1,"serviceContext":{"service":"workspacekit","version":"commit-59a5f028c507144b9cac1477e9c4de31bc6d8849"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
sent tapfd=5 for tap0
received tapfd=5
{"level":"debug", "msg": "nsexec started"}
{"level":"debug", "msg": "join mnt namespace: 5"}
{"level":"debug", "msg": "chroot: 3"}
{"level":"debug", "msg": "chcwd: 4"}
{"level":"debug", "msg": "join net namespace: 6"}
{"command":"proxy","level":"info","message":"starting bob proxy on :8080","serviceContext":{"service":"bob","version":""},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","error":"rpc error: code = NotFound desc = no token available","level":"error","message":"cannot get token for Gitpod API","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"ERROR","time":"2022-03-15T11:15:13Z"}
{"level":"error","message":"auto-port exposure won't work","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"ERROR","time":"2022-03-15T11:15:13Z"}
apifd event
api_handler: got request: {"execute":"add_hostfwd","arguments":{"proto":"tcp","host_addr":"0.0.0.0","host_port":23000,"guest_addr":"10.0.2.100","guest_port":23000}}

apifd event
api_handler: got request: {"execute":"add_hostfwd","arguments":{"proto":"tcp","host_addr":"0.0.0.0","host_port":22999,"guest_addr":"10.0.2.100","guest_port":22999}}

apifd event
api_handler: got request: {"execute":"add_hostfwd","arguments":{"proto":"tcp","host_addr":"0.0.0.0","host_port":23001,"guest_addr":"10.0.2.100","guest_port":23001}}

{"level":"info","location":"/workspace/.gitpod.yml","message":"gitpod config watcher: starting...","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"error":"open /workspace/.gitpod/content.json: no such file or directory","level":"info","message":"no content init descriptor found - not trying to run it","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","error":"not connected to Gitpod server","level":"error","message":"error tracking supervisor_readiness","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"ERROR","time":"2022-03-15T11:15:13Z"}
{"level":"info","message":"supervisor: workspace content available","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","source":"from-other","time":"2022-03-15T11:15:13Z"}
{"level":"info","message":"supervisor: workspace content available","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"level":"info","location":"/workspace/.gitpod.yml","message":"gitpod config watcher: started","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"command":"{\nsudo -E /app/bob build\n}; exit","level":"info","message":"starting a task terminal...","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","error":"not connected to Gitpod server","level":"error","message":"error tracking supervisor_readiness","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"ERROR","time":"2022-03-15T11:15:13Z"}
{"@type":"type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent","error":"stat /workspace/atheoscommune.github.io: no such file or directory","level":"error","message":"default workdir provider: cannot resolve the workspace root","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"ERROR","time":"2022-03-15T11:15:13Z"}
{"alias":"c09f83bd-0c0e-437a-8c3e-78c53204b646","cmd":"/bin/bash","level":"info","message":"started new terminal","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"command":"{\nsudo -E /app/bob build\n}; exit","level":"info","message":"task terminal has been started","pid":47,"serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","terminal":"c09f83bd-0c0e-437a-8c3e-78c53204b646","time":"2022-03-15T11:15:13Z"}
{"level":"info","message":"Writing build output to /workspace/.gitpod/prebuild-log-0","pid":47,"serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:13Z"}
{"alias":"c09f83bd-0c0e-437a-8c3e-78c53204b646","level":"info","message":"closing terminal","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:14Z"}
{"command":"{\nsudo -E /app/bob build\n}; exit","level":"info","message":"task terminal has been closed","pid":47,"serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","terminal":"c09f83bd-0c0e-437a-8c3e-78c53204b646","time":"2022-03-15T11:15:14Z"}
{"level":"info","message":"received SIGTERM (or shutdown) - tearing down","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:14Z"}
{"level":"info","message":"shutting down API endpoint","serviceContext":{"service":"supervisor","version":"commit-c8a60b96a3627b36dd580af7259f29a011eed650"},"severity":"INFO","time":"2022-03-15T11:15:14Z"}
{"level":"info","message":"done","ring":1,"serviceContext":{"service":"workspacekit","version":"commit-59a5f028c507144b9cac1477e9c4de31bc6d8849"},"severity":"INFO","time":"2022-03-15T11:15:14Z"}
{"level":"info","message":"done","ring":0,"serviceContext":{"service":"workspacekit","version":"commit-59a5f028c507144b9cac1477e9c4de31bc6d8849"},"severity":"INFO","time":"2022-03-15T11:15:14Z"}

The .gitpod.Dockerfile content:

FROM gitpod/workspace-full:latest

RUN touch /tmp/myfile

The error in the UI:

image
princerachit commented 2 years ago

@MrSimonEmms helped me on this issue to reproduce this problem using azure registry. I could not reproduce it.

He also pointed out that there seems to be an integration problem with gitlab image registry with gitpod. So this issue is specific only to gitlab and not all external private registries.

I will sync with Simon further on this as he has a working theory on why gitlab integration will fail.

csweichel commented 2 years ago

This is most likely a misunderstanding/omission of the distribution spec as implemented/rewritten by bob proxy.

mrsimonemms commented 2 years ago

My working theory is that GitLab deviates slightly from the Docker API v2 spec OR uses a feature that we've not tested/got working.

GitLab registry works slightly differently from in-cluster/Azure and is more like GCP where the URL in the auth is different to what we pass into the Installer config:

containerRegistry:
  inCluster: false
  external:
    URL: <url-to-push-images-to>
    certificate:
      kind: secret
      name: <secret-name>

In-Cluster/Azure

The authentication is on gitpod.registry.com and then you push images to gitpod.registry.com/workspace-images. This means that in your .dockerconfigjson secret, the URL matches the URL you give in the JSON.

{
    "auths": {
        "gitpod.registry.com": {
            "auth": "dXNlcm5hbWU6cGFzc3dvcmQK" // Base64 of "username:password"
        }
    }
}

GCP

This uses a different URL in the auth - the config URL specifies gcr.io/<project>

{
    "auths": {
        "gcr.io": {
            "auth": "dXNlcm5hbWU6cGFzc3dvcmQK" // Base64 of "username:password"
        }
    }
}

A GitLab auth token should look like this and specify the URL of registry.gitlab.com/<owner>/<project> in the config

{
    "auths": {
        "registry.gitlab.com": {
            "auth": "dXNlcm5hbWU6cGFzc3dvcmQK" // Base64 of "username:password"
        }
    }
}
princerachit commented 2 years ago

I have moved this back to scheduled due to offsite at gitpod. We are going to mostly be occupied with planning and meetings this week

princerachit commented 2 years ago

So far I have confirmed @MrSimonEmms hypothesis about this not working because of the URL. I tried gitlab and dockerhub, it failed with exactly the same error for both of them.

Initially I suspected that this change might have caused the issue, however, even after commenting out the line it still did not work.

I think the URL rewriting + Auth is failing. One possibility I can think of is incorrect headers being set by the proxy while trying to push the image.

The image URL however looks correct. I confirmed it by comparing the logs of image build pod with skopeo logs. The URL format matches in both the cases:

Skopeo Logs

DEBU[0004] HEAD https://registry-1.docker.io/v2/psinha36/nginx/blobs/sha256:6c0ee9353e13944dca360479cb7eecfa65c6726948c1b85db3f8b57b68631a3b
DEBU[0004] HEAD https://registry-1.docker.io/v2/psinha36/nginx/blobs/sha256:ae13dd57832654086618a81dbc128846aa092489260c326ee95429b63c3cf213
DEBU[0004] HEAD https://registry-1.docker.io/v2/psinha36/nginx/blobs/sha256:dca7733b187e4e05ef6a71f40eb02380dde472b7e3da6dcffcafcfded823352b
DEBU[0004] HEAD https://registry-1.docker.io/v2/psinha36/nginx/blobs/sha256:be0c016df0be98964bf62fc97d820463c5228ed3ceef321cb4bedc5b86eb7660
DEBU[0004] HEAD https://registry-1.docker.io/v2/psinha36/nginx/blobs/sha256:9eaf108767c796d28e8400fe30b87d5624b985847173bb20587ae85bc7179e3a
DEBU[0004] HEAD https://registry-1.docker.io/v2/psinha36/nginx/blobs/sha256:352e5a6cac2644c979e06a33493d883694ad0716bab021561da45e2f4afd84cd

Image build pod Logs

2022/03/28 11:22:59 [DEBUG] HEAD https://registry-1.docker.io/v2/psinha36/base-images/blobs/sha256:5b8be2fd806ec98e59aa8720759facf267b98a8ed5a36d1c0323bfe897725b86
{"host":"registry-1.docker.io","level":"info","message":"authorizing registry access","serviceContext":{"service":"bob","version":""},"severity":"INFO","time":"2022-03-28T11:22:59Z","user":"psinha36"}
2022/03/28 11:22:59 [DEBUG] HEAD https://registry-1.docker.io/v2/psinha36/base-images/blobs/sha256:ae13dd57832654086618a81dbc128846aa092489260c326ee95429b63c3cf213 (status: 401): retrying in 1s (3 left)
{"host":"registry-1.docker.io","level":"info","message":"authorizing registry access","serviceContext":{"service":"bob","version":""},"severity":"INFO","time":"2022-03-28T11:22:59Z","user":"psinha36"}
2022/03/28 11:22:59 [DEBUG] HEAD https://registry-1.docker.io/v2/psinha36/base-images/blobs/sha256:950c3791111a456d92bc047c696a3d408aa55a0361a0ae5f5088ef7d20edec65 (status: 401): retrying in 1s (3 left)
{"host":"registry-1.docker.io","level":"info","message":"authorizing registry access","serviceContext":{"service":"bob","version":""},"severity":"INFO","time":"2022-03-28T11:22:59Z","user":"psinha36"}
2022/03/28 11:22:59 [DEBUG] HEAD https://registry-1.docker.io/v2/psinha36/base-images/blobs/sha256:5b8be2fd806ec98e59aa8720759facf267b98a8ed5a36d1c0323bfe897725b86 (status: 401): retrying in 1s (3 left)
princerachit commented 2 years ago

I pulled out the Bearer token being added in this line and tried doing a curl. https://github.com/gitpod-io/gitpod/blob/5876933b2f71e0a78371a457cec84a156119d77d/components/image-builder-bob/pkg/proxy/proxy.go#L179

The request failed with 401

Bearer realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:psinha36/base-images:pull",error="invalid_token"

. The Authorization header seems to have an issue.

princerachit commented 2 years ago

I suspect that the issue is that we are not passing appropriate scope when trying to authenticate against Dockerhub. I am going to add scope to docker authorizer and test.

princerachit commented 2 years ago

Update: I did add scope registry:psinha36/base-images:pull,push but the push is still failing with unauthorized 401 error.

princerachit commented 2 years ago

I was able to narrow down the source of error for dockerhub, it is happening because docker hub is returning a token which does not have complete information and rather just a jti. From the docs, JTI is returned to avoid displaying token/replay of token. The client is supposed to reuse the token based on the JTI returned, however, client is not able to do that because we create a new docker.Authorizer for every request and not caching the token.

This issue is particularly challenging because of the underlying containerd library that we are using. Different container registries can have different behaviour in terms of token generation. Whereas almost all the registries support jwt token (based on what I have seen with gitlab, dockerhub and azure) we can never be 100% sure that this will be the case going forward. Therefore, if we try to extract the bearer token and decode it then it might fail in some cases.

I am searching if we can somehow create a single authorizer which can auto refresh and store the generated token in cache.

I have added a debug configuration for VS code in https://github.com/gitpod-io/gitpod/tree/prs/x-repo-new branch

princerachit commented 2 years ago

Next Steps That we can take to resolve this issue:

This needs to be tested with multiple registries to determine if the issue understanding is correct and how this will behave if there is suggested change.

kylos101 commented 2 years ago

@princerachit how does GAR behave?

princerachit commented 2 years ago

@princerachit how does GAR behave?

I haven't tested with GAR yet and don't plan to test it our anytime sooner. I will start with either dockerhub or gitlab and try solving one at a time.

princerachit commented 2 years ago

It looks like some change that went in might have broken the azure integration too. I tried updating my gitpod installation using the installer from a commit before the last changes made into image-builder-mk3 and it failed.

image

Update: Azure integration still works.

mrsimonemms commented 2 years ago

how does GAR behave?

Seemingly, GAR works fine but requires the same configuration as GCP's container registry

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Fentonfi commented 1 year ago

Hi All, Is there any resolution for the above issue?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.