Open Watercycle opened 1 year ago
Hi @Watercycle, thank you for writing down such a detailed bug report 🙏 I'm adding it to the appropriate team's inbox 👋
FYI https://github.com/gitpod-io/gitpod/issues/12365 We mitigated it, but we cannot make it solve 100% :sob:
It may be possible to control the number of retries with environment variables.
@utam0k @Furisto aside from https://github.com/gitpod-io/gitpod/issues/12365, how else might we be able to help? I'm going to add to breakdown for now, so that it's socialized during refinement next week.
edit: the only thing I can think of, is moving to kata to simplify the runtime, but, know that's far out.
Likely caused by issues with seccomp notify which are very hard to debug. Apart from @utam0k suggestion to increase the number of retries or switch to kata where we would not need seccomp anymore, I do not see another good solution at the moment.
@kylos101 @Furisto I wonder if runc-facade's retry mechanism doesn't work. Unfortunately, I didn't find these error messages as I reproduced this error on the preview-env. Realizing that, I created the preview env with this branch. https://github.com/gitpod-io/gitpod/blob/6a852abc01b1ff2d45e032ddf6805390219c50b9/components/docker-up/runc-facade/main.go#L98-L101
So how about making sure whether or not it works fine as a first step?
So how about making sure whether or not it works fine as a first step?
Sounds reasonable
Perfect! @utam0k please update the issue description accordingly? :pray:
I have added Next Actions
section on the description
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bug description
When running
docker-compose up
(i.e. Docker), about 5-10% of the time random services will fail to start with errors along the lines of the following:Next Actions
Steps to reproduce
gitpod/workspace-full
image.docker-compose.yml
file with the file contents below.docker-compose.yml
Errors Example Screenshot
While this isn't what a typical compose file looks like, this has helped consistently mimic what members of my team frequently report since we use GitPod to quickly spin up our platform on feature branches. It's quite obnoxious having to restart the platform when core startup services fail.
Workspace affected
No response
Expected behavior
There should be no "operation not permitted" errors. The Docker services should successfully start and enter into a healthy state. Running the "Steps to reproduce" locally on Ubuntu 22, I'm unable to reproduce these startup failures. It only ever happens in GitPod, which is why I'm inclined to file the issue here instead of the docker/compose repo.
Example repository
To be clear, this seems to impact all workspaces. Here's a snapshot with the compose file above using the Haskell sample workspace: https://gitpod.io#snapshot/7b230be0-0aa0-4242-bcff-d9a2229afddd
Anything else?
Worth noting:
nginx
image in the example compose file is switched withalpine
, these errors still happen. Albeit, seemingly less frequently.docker-compose up
as the root user in asudo -i
session.