Closed adawalli closed 10 months ago
Hey @adawalli, it was great talking with you. Let me know if the solution worked and if we can close the ticket.
There are many of us experiencing this issue.
Hey @nuzayets, please open a zendesk ticket so we can address and help out.
Any change this workaround will be made public? Like @nuzayets said there are multiple customers experiencing this.
@guyarb - we have not seen the issue again after implementing your fix. Also would like to see this publicly documented (and fixed if possible)
@adawalli What was the fix?
Was on vacation, sorry for late response.
Disabling service monitoring did the trick for us. This was acceptable in this one cluster where we are running gitlab jobs, however, I am hoping a more proper fix is submitted by datadog.
datadog:
serviceMonitoring:
enabled: false
Thank you!
Any progress on a fix for this without disabling the monitoring?
Thanks for bumping it @sigwinch28
Actually, there is no need to disable USM monitoring, but to disable https monitoring instead.
agents:
containers:
systemProbe:
env:
- name: DD_SYSTEM_PROBE_NETWORK_ENABLE_HTTPS_MONITORING
value: "false"
Regarding a fix, we're still working on it.
Hey folks, earlier today we released a new agent version 7.50.0 (and 6.50.0) which contains a fix to the problem above, and we're not ignoring buildkit process from our HTTPs hooking mechanism. I do encourage you to upgrade the version and re-enable the feature.
Agent Environment
Agent 7.46.0 - Commit: b2f5e36 - Serialization version: v5.0.85 - Go version: go1.19.10
Describe what happened: Datadog agent has been causing buildkit failures in kubernetes as discussed in https://github.com/moby/buildkit/issues/3812
Something in the datadog agent is monitoring the overlayfs layers during a build which causes buildkit to fail when it cannot unmount those layers. With datadog helm chart installed in nearly a completely vanilla fashion, we can cause failures nearly 100% of the time.
Completely disabling datadog causes passes 100% of the time. So far, we have also seen 100% passes by turning off Universal Service Monitoring, but more testing is required to confirm this.
Sample Error message
Describe what you expected: Datadog should not be causing DIND buildkit builds to fail.
Additional environment details (Operating System, Cloud provider, etc):