Open pjohnsonrxb opened 10 months ago
We get the same problem in our ARM64 self-hosted workflows, although our k8s cluster is not behind a VPN
We're seeing the same issues, both with and without buildx. Can't pinpoint an exact cause. On AWS behind a VPC/transit gateway etc but no VPN. platform: amd64
Try specifying the builder explicitly:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
id: builder
- name: Build and push
uses: docker/build-push-action@v6
with:
# ...
builder: ${{ steps.builder.outputs.name }}
Same issue here, solved with a retry step :(
same problem with self-hosted windows build-runner sending context to linux buildkitd on same LAN.
Contributing guidelines
I've found a bug, and:
Description
Issue: Self-Hosted Runners on GHA Workflows with Kubernetes Driver
Background
We have configured our GitHub Actions (GHA) workflows to use self-hosted runners. Our typical workflow involves:
buildx
buildx
Problem
We are encountering an issue when using the Kubernetes (k8s) driver for our builds. Our self-hosted runners are deployed on our k8s cluster. We're experiencing a specific error as shown in the screenshot below:
Kubernetes Container Logs:
time="2023-11-30T22:28:11Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled"
Hypothesis
We suspect that the issue might be related to our runners being behind a VPN. It seems
buildx
may not be adequately handling network latency associated with a VPN connection.Observations
References
For additional context, see this related issue.
Seeking insights or suggestions to resolve this intermittent failure with our self-hosted runners in GHA workflows.
Expected Behavior
When using self-hosted runners in GitHub Actions workflows with the Kubernetes (k8s) driver for
buildx
, we expect the following:Stable Connection to Build Services: The runners should maintain a stable connection to Docker's build services, regardless of being behind a VPN. Network latency typically associated with VPN connections should not disrupt the build process.
Consistent Build Process: Each action initiated by the workflow should complete successfully without intermittent failures. The build, push, and cache processes via
buildx
should be executed reliably.Error-Free Operation: The
buildx
command, especially when interacting with Kubernetes, should execute without returning errors like/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled
.Consistency with GitHub Hosted Runners: The performance and reliability of builds using self-hosted runners should be comparable to those observed with GitHub's hosted runners.
The expectation is that the self-hosted runners on our Kubernetes cluster should work as efficiently and reliably as GitHub's hosted runners, ensuring a smooth CI/CD pipeline.
Actual Behavior
When using self-hosted runners in GitHub Actions workflows with the Kubernetes (k8s) driver for
buildx
, we are encountering the following issues:Unstable Connection to Build Services: The runners, especially when operating behind a VPN, are experiencing unstable connections to Docker's build services. This is evident from frequent connection cancellations and errors during the build process.
Inconsistent Build Process: The actions initiated by the workflow are not completing consistently. Approximately 20% of the actions (1 in 5) fail intermittently, showcasing a lack of reliability in the build, push, and cache processes via
buildx
.Frequent Errors: We are frequently encountering errors such as
/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled
. These errors suggest issues with the interaction betweenbuildx
and Kubernetes.Disparity with GitHub Hosted Runners: Unlike the smooth operation observed with GitHub's hosted runners, our self-hosted runners exhibit inconsistent and error-prone behavior, leading to a disrupted CI/CD pipeline.
In summary, our self-hosted runners on the Kubernetes cluster are not performing as efficiently or reliably as expected, particularly in comparison to GitHub's hosted runners.
Repository URL
No response
Workflow run URL
No response
YAML workflow
Workflow logs
No response
BuildKit logs
No response
Additional info
Also it is important to note that this job only ever cancels when doing build and push. We use actions for other things and the actions never just cancel for no reason.