gitpod-io / gitpod

The developer platform for on-demand cloud development environments to create software faster and more securely.
https://www.gitpod.io
GNU Affero General Public License v3.0
12.99k stars 1.25k forks source link

[prebuild][selfhosted] prebuilds freeze and does not fail nor finish. #8942

Closed fredrikhgrelland closed 2 years ago

fredrikhgrelland commented 2 years ago

Bug description

I am running a docker build as an init-task in a self-hosted gitpod on EKS. This works fine every time when opening a new workspace. I have given the pods 16gb ram and 4 cpus.

When starting a prebuild with with the magic-link https://gitpod.example.com/#prebuild/https://github/myrepo one of the following things will happen:

For both cases i can observe:

Dependencies Resolved

================================================================================ Package Arch Version Repository Size

Installing: gcc x86_64 7.3.1-13.amzn2 amzn2-core 22 M python3 x86_64 3.7.10-1.amzn2.0.1 amzn2-core 72 k python3-devel x86_64 3.7.10-1.amzn2.0.1 amzn2-core 243 k Installing for dependencies: binutils x86_64 2.29.1-31.amzn2 amzn2-core 5.8 M cpp x86_64 7.3.1-13.amzn2 amzn2-core 9.2 M dwz x86_64 0.11-3.amzn2.0.3 amzn2-core 98 k glibc-devel x86_64 2.26-57.amzn2 amzn2-core 994 k glibc-headers x86_64 2.26-57.amzn2 amzn2-core 514 k groff-base x86_64 1.22.2-8.amzn2.0.2 amzn2-core 948 k kernel-headers x86_64 4.14.268-205.500.amzn2 amzn2-core 1.2 M libatomic x86_64 7.3.1-13.amzn2 amzn2-core 46 k libcilkrts x86_64 7.3.1-13.amzn2 amzn2-core 85 k libgomp x86_64 7.3.1-13.amzn2 amzn2-core 204 k libitm x86_64 7.3.1-13.amzn2 amzn2-core 84 k libmpc x86_64 1.0.1-3.amzn2.0.2 amzn2-core 52 k libmpx x86_64 7.3.1-13.amzn2 amzn2-core 51 k libquadmath x86_64 7.3.1-13.amzn2 amzn2-core 189 k libsanitizer x86_64 7.3.1-13.amzn2 amzn2-core 641 k libtirpc x86_64 0.2.4-0.16.amzn2 amzn2-core 88 k mpfr x86_64 3.1.1-4.amzn2.0.2 amzn2-core 208 k perl x86_64 4:5.16.3-299.amzn2.0.1 amzn2-core 8.0 M perl-Carp noarch 1.26-244.amzn2 amzn2-core 19 k perl-Encode x86_64 2.51-7.amzn2.0.2 amzn2-core 1.5 M perl-Exporter noarch 5.68-3.amzn2 amzn2-core 29 k perl-File-Path noarch 2.09-2.amzn2 amzn2-core 27 k perl-File-Temp noarch 0.23.01-3.amzn2 amzn2-core 56 k perl-Filter x86_64 1.49-3.amzn2.0.2 amzn2-core 76 k perl-Getopt-Long noarch 2.40-3.amzn2 amzn2-core 56 k perl-HTTP-Tiny noarch 0.033-3.amzn2 amzn2-core 38 k perl-PathTools x86_64 3.40-5.amzn2.0.2 amzn2-core 83 k perl-Pod-Escapes noarch 1:1.04-299.amzn2.0.1 amzn2-core 52 k perl-Pod-Perldoc noarch 3.20-4.amzn2 amzn2-core 87 k perl-Pod-Simple noarch 1:3.28-4.amzn2 amzn2-core 216 k perl-Pod-Usage noarch 1.63-3.amzn2 amzn2-core 27 k perl-Scalar-List-Utils x86_64 1.27-248.amzn2.0.2 amzn2-core 36 k perl-Socket x86_64 2.010-4.amzn2.0.2 amzn2-core 49 k perl-Storable x86_64 2.45-3.amzn2.0.2 amzn2-core 78 k perl-Text-ParseWords noarch 3.29-4.amzn2 amzn2-core 14 k perl-Time-HiRes x86_64 4:1.9725-3.amzn2.0.2 amzn2-core 45 k perl-Time-Local noarch 1.2300-2.amzn2 amzn2-core 24 k perl-constant noarch 1.27-2.amzn2.0.1 amzn2-core 19 k perl-libs x86_64 4:5.16.3-299.amzn2.0.1 amzn2-core 685 k perl-macros x86_64 4:5.16.3-299.amzn2.0.1 amzn2-core 44 k perl-parent noarch 1:0.225-244.amzn2.0.1 amzn2-core 12 k perl-podlators noarch 2.5.1-3.amzn2.0.1 amzn2-core 112 k perl-srpm-macros noarch 1-8.amzn2.0.1 amzn2-core 4.7 k perl-threads x86_64 1.87-4.amzn2.0.2 amzn2-core 50 k perl-threads-shared x86_64 1.43-6.amzn2.0.2 amzn2-core 39 k pkgconfig x86_64 1:0.27.1-4.amzn2.0.2 amzn2-core 54 k python-rpm-macros noarch 3-60.amzn2.0.1 amzn2-core 14 k python-srpm-macros noarch 3-60.amzn2.0.1 amzn2-core 18 k python3-libs x86_64 3.7.10-1.amzn2.0.1 amzn2-core 9.2 M python3-pip noarch 20.2.2-1.amzn2.0.3 amzn2-core 2.0 M python3-rpm-macros noarch 3-60.amzn2.0.1 amzn2-core 12 k python3-setuptools noarch 49.1.3-1.amzn2.0.2 amzn2-core 1.1 M system-rpm-config noarch 9.1.0-76.amzn2.0.10 amzn2-core 80 k zip x86_64 3.0-11.amzn2.0.2 amzn2-core 263 k

Transaction Summary

Install 3 Packages (+54 Dependent packages)

Total download size: 67 M Installed size: 213 M Downloading packages:

Total 14 MB/s | 67 MB 00:04
Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : mpfr-3.1.1-4.amzn2.0.2.x86_64 1/57 Installing : libmpc-1.0.1-3.amzn2.0.2.x86_64 2/57 Installing : cpp-7.3.1-13.amzn2.x86_64 3/57 Installing : libmpx-7.3.1-13.amzn2.x86_64 4/57 Installing : groff-base-1.22.2-8.amzn2.0.2.x86_64 5/57 Installing : perl-HTTP-Tiny-0.033-3.amzn2.noarch 6/57 Installing : 1:perl-parent-0.225-244.amzn2.0.1.noarch 7/57 Installing : perl-Pod-Perldoc-3.20-4.amzn2.noarch 8/57 Installing : perl-podlators-2.5.1-3.amzn2.0.1.noarch 9/57 Installing : 1:perl-Pod-Escapes-1.04-299.amzn2.0.1.noarch 10/57 Installing : perl-Text-ParseWords-3.29-4.amzn2.noarch 11/57 Installing : perl-Encode-2.51-7.amzn2.0.2.x86_64 12/57 Installing : perl-Pod-Usage-1.63-3.amzn2.noarch 13/57 Installing : 4:perl-macros-5.16.3-299.amzn2.0.1.x86_64 14/57 Installing : 4:perl-libs-5.16.3-299.amzn2.0.1.x86_64 15/57 Installing : perl-Exporter-5.68-3.amzn2.noarch 16/57 Installing : perl-Time-Local-1.2300-2.amzn2.noarch 17/57 Installing : perl-constant-1.27-2.amzn2.0.1.noarch 18/57 Installing : perl-Scalar-List-Utils-1.27-248.amzn2.0.2.x86_64 19/57 Installing : perl-PathTools-3.40-5.amzn2.0.2.x86_64 20/57 Installing : 4:perl-Time-HiRes-1.9725-3.amzn2.0.2.x86_64 21/57 Installing : perl-Socket-2.010-4.amzn2.0.2.x86_64 22/57 Installing : perl-Storable-2.45-3.amzn2.0.2.x86_64 23/57 Installing : perl-Carp-1.26-244.amzn2.noarch 24/57 Installing : perl-File-Temp-0.23.01-3.amzn2.noarch 25/57 Installing : perl-File-Path-2.09-2.amzn2.noarch 26/57 Installing : 1:perl-Pod-Simple-3.28-4.amzn2.noarch 27/57 Installing : perl-Getopt-Long-2.40-3.amzn2.noarch 28/57 Installing : perl-threads-shared-1.43-6.amzn2.0.2.x86_64 29/57 Installing : perl-threads-1.87-4.amzn2.0.2.x86_64 30/57 Installing : perl-Filter-1.49-3.amzn2.0.2.x86_64 31/57 Installing : 4:perl-5.16.3-299.amzn2.0.1.x86_64 32/57 Installing : zip-3.0-11.amzn2.0.2.x86_64 33/57 Installing : dwz-0.11-3.amzn2.0.3.x86_64 34/57 Installing : libatomic-7.3.1-13.amzn2.x86_64 35/57 Installing : 1:pkgconfig-0.27.1-4.amzn2.0.2.x86_64 36/57 Installing : libcilkrts-7.3.1-13.amzn2.x86_64 37/57 Installing : kernel-headers-4.14.268-205.500.amzn2.x86_64 38/57 Installing : glibc-headers-2.26-57.amzn2.x86_64 39/57 Installing : glibc-devel-2.26-57.amzn2.x86_64 40/57 Installing : perl-srpm-macros-1-8.amzn2.0.1.noarch 41/57 Installing : system-rpm-config-9.1.0-76.amzn2.0.10.noarch 42/57 Installing : python-srpm-macros-3-60.amzn2.0.1.noarch 43/57 Installing : python-rpm-macros-3-60.amzn2.0.1.noarch 44/57 Installing : python3-rpm-macros-3-60.amzn2.0.1.noarch 45/57 Installing : libitm-7.3.1-13.amzn2.x86_64 46/57 Installing : libtirpc-0.2.4-0.16.amzn2.x86_64 47/57 Installing : python3-setuptools-49.1.3-1.amzn2.0.2.noarch 48/57 Installing : python3-libs-3.7.10-1.amzn2.0.1.x86_64 49/57 Installing : python3-3.7.10-1.amzn2.0.1.x86_64 50/57 Installing : python3-pip-20.2.2-1.amzn2.0.3.noarch 51/57 Installing : libgomp-7.3.1-13.amzn2.x86_64 52/57 Installing : binutils-2.29.1-31.amzn2.x86_64 53/57 Installing : libquadmath-7.3.1-13.amzn2.x86_64 54/57

- the last few log-lines of the pod logs are:

022-03-23T19:13:10.403583732Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"\u001b[36mINFO\u001b[0m[2022-03-23T19:13:10.403375610Z] Layer sha256:ef0c7ae227b2ca9d656b7472fdf9625d0b4fb0d710446b3910689bc0b7e05926 cleaned up \r\n","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:13:10Z"} 2022-03-23T19:13:11.424728801Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"\u001b[36mINFO\u001b[0m[2022-03-23T19:13:11.424525877Z] Layer sha256:ef0c7ae227b2ca9d656b7472fdf9625d0b4fb0d710446b3910689bc0b7e05926 cleaned up \r\n","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:13:11Z"} 2022-03-23T19:13:12.397423269Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"\u001b[36mINFO\u001b[0m[2022-03-23T19:13:12.397243466Z] Layer sha256:ef0c7ae227b2ca9d656b7472fdf9625d0b4fb0d710446b3910689bc0b7e05926 cleaned up \r\n","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:13:12Z"} 2022-03-23T19:13:13.401560619Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"\u001b[36mINFO\u001b[0m[2022-03-23T19:13:13.401387801Z] Layer sha256:ef0c7ae227b2ca9d656b7472fdf9625d0b4fb0d710446b3910689bc0b7e05926 cleaned up \r\n","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:13:13Z"} 2022-03-23T19:13:19.100783313Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"time=\"2022-03-23T19:13:19.100289175Z\" level=info msg=\"starting signal loop\" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/3175a226e13b96aca3666c1ff5da1fa89f0ff50eb375ab280dae8398876dc41f pid=774\r\n","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:13:19Z"} 2022-03-23T19:13:19.273519835Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"\u001b[31mERRO\u001b[0m[0000] failed to disable IPv6 forwarding for container's interface all: open /proc/sys/net/ipv6/conf/all/disable_ipv6: read-only file system \r\n","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:13:19Z"} 2022-03-23T19:13:19.277003322Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"\u001b[33mWARN\u001b[0m[2022-03-23T19:13:19.276839148Z] Failed to disable IPv6 on all interfaces on network namespace \"/var/run/docker/netns/0d57e6164179\": reexec to set IPv6 failed: exit status 4 \r\n","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:13:19Z"} 2022-03-23T19:13:19.388745822Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"time=\"2022-03-23T19:13:19.388494215Z\" level=error msg=\"add cg to OOM monitor\" error=\"cgroups: memory cgroup not supported on this system\"\r\n","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:13:19Z"} 2022-03-23T19:14:31.393949730Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"\u001b[36mINFO\u001b[0m[2022-03-23T19:14:31.393663633Z] ignoring event \u001b[36mcontainer\u001b[0m=3175a226e13b96aca3666c1ff5da1fa89f0ff50eb375ab280dae8398876dc41f \u001b[36mmodule\u001b[0m=libcontainerd \u001b[36mnamespace\u001b[0m=moby \u001b[36mtopic\u001b[0m=/tasks/delete \u001b[36mtype\u001b[0m=\"*events.TaskDelete\"","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:14:31Z"} 2022-03-23T19:14:31.393993839Z {"label":"24db7a43-c5e3-4254-8796-045f8bcfd7fc","level":"info","message":"\r\n\u001b[36mINFO\u001b[0m[2022-03-23T19:14:31.393658438Z] shim disconnected \u001b[36mid\u001b[0m=3175a226e13b96aca3666c1ff5da1fa89f0ff50eb375ab280dae8398876dc41f\r\n\u001b[31mERRO\u001b[0m[2022-03-23T19:14:31.393791565Z] copy shim log \u001b[31merror\u001b[0m=\"read /proc/self/fd/16: file already closed\"\r\n","serviceContext":{"service":"supervisor","version":"commit-c94004de2e4ad99f5aa48d3e2f63c9afde1fadb8"},"severity":"INFO","terminalOutput":true,"time":"2022-03-23T19:14:31Z"}



### Steps to reproduce

Running a prebuild building a docker image with significant load?

### Workspace affected

_No response_

### Expected behavior

Prebuilds should fail or succed. Now stuck in limbo.

### Example repository

_No response_

### Anything else?

_No response_
axonasif commented 2 years ago

Hey @fredrikhgrelland, is repo public? It would help to reproduce. (maybe one of your dockerfile commands requires user input?) Prebuilds can not run longer than 1h, not sure if that's the case.

fredrikhgrelland commented 2 years ago

@axonasif I made a reproducible repo public here: https://github.com/fredrikhgrelland/gitpod-public-sandbox.

Case:

Pothulapati commented 2 years ago

Hey @fredrikhgrelland, Thanks for the detailed issue, repro cases. Definitely helps make debugging easier. :)

So, I have a self-hosted instance running at tarun.gitpod-self-hosted.com and for me the prebuild seems to work fine and ran to completion just like gitpod-io, and you can find the logs here. I'm still finding similar error logs in the prebuild container though i.e

{"label":"4ea4451c-88f9-4d6b-aae4-b5efbff7f576","level":"info","message":"\u001b[31mERRO\u001b[0m[0000] failed to disable IPv6 forwarding for container's interface all: open /proc/sys/net/ipv6/conf/all/disable_ipv6: read-only file system \r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T06:27:23Z"}
{"label":"4ea4451c-88f9-4d6b-aae4-b5efbff7f576","level":"info","message":"\u001b[33mWARN\u001b[0m[2022-04-06T06:27:23.988344674Z] Failed to disable IPv6 on all interfaces on network namespace \"/var/run/docker/netns/05084a1153e9\": reexec to set IPv6 failed: exit status 4 \r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T06:27:23Z"}
{"label":"4ea4451c-88f9-4d6b-aae4-b5efbff7f576","level":"info","message":"time=\"2022-04-06T06:27:24.082745291Z\" level=error msg=\"add cg to OOM monitor\" error=\"open /sys/fs/cgroup/memory/cgroup.event_control: read-only file system\"\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T06:27:24Z"}
{"label":"4ea4451c-88f9-4d6b-aae4-b5efbff7f576","level":"info","message":"\u001b[36mINFO\u001b[0m[2022-04-06T06:27:24.107962721Z] ignoring event                                \u001b[36mcontainer\u001b[0m=1956b3e5dbfb6d29e421c554a331e8dc341958b7b976e97ed53aefbd61560ffd \u001b[36mmodule\u001b[0m=libcontainerd \u001b[36mnamespace\u001b[0m=moby \u001b[36mtopic\u001b[0m=/tasks/delete \u001b[36mtype\u001b[0m=\"*events.TaskDelete\"","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T06:27:24Z"}
{"label":"4ea4451c-88f9-4d6b-aae4-b5efbff7f576","level":"info","message":"\r\n\u001b[36mINFO\u001b[0m[2022-04-06T06:27:24.108178124Z] shim disconnected                             \u001b[36mid\u001b[0m=1956b3e5dbfb6d29e421c554a331e8dc341958b7b976e97ed53aefbd61560ffd\r\n\u001b[31mERRO\u001b[0m[2022-04-06T06:27:24.108283005Z] copy shim log                                 \u001b[31merror\u001b[0m=\"read /proc/self/fd/16: file already closed\"\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T06:27:24Z"}
{"label":"4ea4451c-88f9-4d6b-aae4-b5efbff7f576","level":"info","message":"time=\"2022-04-06T06:27:25.896634250Z\" level=info msg=\"starting signal loop\" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/5d5aa37079ecb996943be96a83de875668f514a5957eee1233f6293e2da219b8 pid=1840\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T06:27:25Z"}

Debugging more on this, and will update on this once I find more!

but the prebuild should still run to completion (just like in my case) and I'm wondering why that is not the case in your environment 🤔

fredrikhgrelland commented 2 years ago

Yeah, it is bit of a head-scratcher. I have replicated this on a brand new cluster using the latest version of gitpod and I am still seeing it freeze-fail. It is not failing, but freezing up, and that is part of the problem.

Are you seeing these kind of logs as well: {"label":"9f1d28bc-8ffb-4f45-b286-7abbf0469cc0","level":"info","message":"time=\"2022-04-06T10:03:48.407286565Z\" level=error msg=\"add cg to OOM monitor\" error=\"open /sys/fs/cgroup/memory/cgroup.event_control: read-only file system\"\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T10:03:48Z"} It stops with this log-line: {"label":"9f1d28bc-8ffb-4f45-b286-7abbf0469cc0","level":"info","message":"\u001b[36mINFO\u001b[0m[2022-04-06T10:05:00.319048376Z] ignoring event \u001b[36mcontainer\u001b[0m=c52bcf11adce366c81109ebec0dd73b3a3cb7a02fab41f38a57888bb889509a9 \u001b[36mmodule\u001b[0m=libcontainerd \u001b[36mnamespace\u001b[0m=moby \u001b[36mtopic\u001b[0m=/tasks/delete \u001b[36mtype\u001b[0m=\"*events.TaskDelete\"\r\n\u001b[31mERRO\u001b[0m[2022-04-06T10:05:00.319114925Z] copy shim log \u001b[31merror\u001b[0m=\"read /proc/self/fd/16: file already closed\"\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T10:05:00Z"}

fredrikhgrelland commented 2 years ago

I have run some more tests and I have maybe found something by looking at the prebuild-pod logs in real-time. The prebuild continues to run fine after these logs:

2022-04-06T11:22:39.097122703Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"time=\"2022-04-06T11:22:39.096690965Z\" level=info msg=\"starting signal loop\" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/cf2ce27e01a7732f763e28fb3af68072da6e3b763f45cdd7177beb437ae5a89d pid=966\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:22:39Z"}
2022-04-06T11:22:39.251712049Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"\u001b[31mERRO\u001b[0m[0000] failed to disable IPv6 forwarding for container's interface all: open /proc/sys/net/ipv6/conf/all/disable_ipv6: read-only file system \r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:22:39Z"}
2022-04-06T11:22:39.254483194Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"\u001b[33mWARN\u001b[0m[2022-04-06T11:22:39.254360210Z] Failed to disable IPv6 on all interfaces on network namespace \"/var/run/docker/netns/04aa21c0f7c2\": reexec to set IPv6 failed: exit status 4 \r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:22:39Z"}
2022-04-06T11:22:39.384084749Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"time=\"2022-04-06T11:22:39.383842402Z\" level=error msg=\"add cg to OOM monitor\" error=\"open /sys/fs/cgroup/memory/cgroup.event_control: read-only file system\"\r\n","serviceContext":

While logs are streaming running for ~25 seconds:

{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:22:39Z"}
2022-04-06T11:23:21.397757620Z {"alias":"1dcdd82d-03d0-4184-928d-8c1496251d31","level":"info","message":"new terminal client","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","time":"2022-04-06T11:23:21Z"}
2022-04-06T11:23:23.412642462Z {"alias":"1dcdd82d-03d0-4184-928d-8c1496251d31","level":"info","message":"new terminal client","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","time":"2022-04-06T11:23:23Z"}

Log-streaming freeze and this pops up in the logs:

2022-04-06T11:23:49.278962037Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"\u001b[36mINFO\u001b[0m[2022-04-06T11:23:49.278657618Z] ignoring event                                \u001b[36mcontainer\u001b[0m=cf2ce27e01a7732f763e28fb3af68072da6e3b763f45cdd7177beb437ae5a89d \u001b[36mmodule\u001b[0m=libcontainerd \u001b[36mnamespace\u001b[0m=moby \u001b[36mtopic\u001b[0m=/tasks/delete \u001b[36mtype\u001b[0m=\"*events.TaskDelete\"\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:23:49Z"}
2022-04-06T11:23:49.279022911Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"\u001b[36mINFO\u001b[0m[2022-04-06T11:23:49.278741941Z] shim disconnected                             \u001b[36mid\u001b[0m=cf2ce27e01a7732f763e28fb3af68072da6e3b763f45cdd7177beb437ae5a89d\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:23:49Z"}
2022-04-06T11:23:49.279031115Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"\u001b[31mERRO\u001b[0m[2022-04-06T11:23:49.278834472Z] copy shim log                                 \u001b[31merror\u001b[0m=\"read /proc/self/fd/16: file already closed\"\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:23:49Z"}
2022-04-06T11:23:57.144430694Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"time=\"2022-04-06T11:23:57.144019839Z\" level=info msg=\"starting signal loop\" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/ce8dc69ef9faacc392296cd07bed61be5977a0db254d4a8494d61136a979b0b9 pid=1453\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:23:57Z"}
2022-04-06T11:23:57.295293487Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"\u001b[31mERRO\u001b[0m[0000] failed to disable IPv6 forwarding for container's interface all: open /proc/sys/net/ipv6/conf/all/disable_ipv6: read-only file system \r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:23:57Z"}
2022-04-06T11:23:57.298205229Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"\u001b[33mWARN\u001b[0m[2022-04-06T11:23:57.298060250Z] Failed to disable IPv6 on all interfaces on network namespace \"/var/run/docker/netns/546576152c18\": reexec to set IPv6 failed: exit status 4 \r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:23:57Z"}
2022-04-06T11:23:57.431655965Z {"label":"3971cec7-cdc4-4fcb-8d85-2307a744cc94","level":"info","message":"time=\"2022-04-06T11:23:57.431393664Z\" level=error msg=\"add cg to OOM monitor\" error=\"open /sys/fs/cgroup/memory/cgroup.event_control: read-only file system\"\r\n","serviceContext":{"service":"supervisor","version":"commit-abd108b30f9e5d8dfd1b1558f19c2f86cb0830d5"},"severity":"INFO","terminalOutput":true,"time":"2022-04-06T11:23:57Z"}

The prebuild will now hang for ~1 hour and then continue to a image-build stage until starting the workspace, the prebuild clearly not done. To speed this up, I might kill the pod and repeat.

NOTE: Normally it gets in a freeze-failing state far earlier and before the web-app starts printing logs.

fredrikhgrelland commented 2 years ago

I have some more observations that might help further:

The webapp log-streaming as well as attaching to the prebuild pod tailing from /workspace/.gitpod/prebuild-log-1 are equal. While observing running processes inside of the docker-build of the prebuildpod using ps -axf and watching /workspace/.gitpod/prebuild-log-1 I can clearly see that the docker-build is continuing well after the logs have stopped streaming.

corneliusludmann commented 2 years ago
  • prebuild starts and says "Connecting to workspace logs..." forever
  • prebuild starts, shows log output and the log stops at a random process.

That is a UX issue. I see it from time to time as well: Prebuilds are running but the logs are not shown in the UI. Reloading the window helps for me usually. I think it's issue https://github.com/gitpod-io/gitpod/issues/8684, correct me when I'm wrong, @geropl.

The prebuild will now hang for ~1 hour and then continue to a image-build stage until starting the workspace, the prebuild clearly not done.

For prebuilds, we have a timeout of 1 hour. How long does the build usually take?

geropl commented 2 years ago

think it's issue https://github.com/gitpod-io/gitpod/issues/8684, correct me when I'm wrong, @geropl.

Exactly. #8684 does overlap with other issues already merged (but not deployed yet), so: we're working on this part.

@fredrikhgrelland Thanks for your very detailed analysis! :pray: As far as I understand, especially after reading this comment of yours, this is a supervisor/workspace problem:

Log-streaming freeze and this pops up in the logs:

ignoring event shim disconnected copy shim log

@kylos101 This sounds like a runtime issue. Can someone from workspace have a look at this? @sagor999 maybe? :thinking:

kylos101 commented 2 years ago

Hi @fredrikhgrelland :wave: ,

Can you tell if you're hitting this line?

If yes, may I ask you to confirm from a regular workspace, what version of docker-compose version that you have?

I ask because about a month ago we updated the version of docker inworkspace-full, but I'm not sure what's being used with your installation. These issues were fixed in docker-compose 2.2.3, but, if you're using 2.2.2, you might be bumping into these issues. [1][2]

fredrikhgrelland commented 2 years ago

Hi @kylos101,

Can you tell if you're hitting this line?

Unless you are running the command steps during a prebuild, I am not. See this line

If yes, may I ask you to confirm from a regular workspace, what version of docker-compose version that you have?

I ask because about a month ago we updated the version of docker inworkspace-full, but I'm not sure what's being used with your installation. These issues were fixed in docker-compose 2.2.3, but, if you're using 2.2.2, you might be bumping into these issues. [1][2]

gitpod /workspace/gitpod-public-sandbox/.airflow (isolate-docker-compose) $ docker-compose version
Docker Compose version v2.2.3
fredrikhgrelland commented 2 years ago

@geropl

Is there any more steps I can take in order to debug a potential supervisor/workspace-issue? Before easter I was chasing a containerd isse that looked like it fit the bill, but am currently a bit stuck. https://github.com/containerd/containerd/issues/6402

geropl commented 2 years ago

@fredrikhgrelland What version of containerd are you on? :thinking:

fredrikhgrelland commented 2 years ago

@geropl I am running containerd github.com/containerd/containerd v1.5.8 1e5ef943eb76627a6d3b6de8cd1ef6537f393a71 This comes from the ami provided in the eks-guide https://github.com/gitpod-io/gitpod-eks-guide. My plan was to update the build with a newer binary, but I am currently unsuccessful to build an ami with the code here: https://github.com/gitpod-io/gitpod-eks-guide/tree/main/ami because of missing files. I have tried to pillage files from the running amis, but gitpod seems to fail getting up and running on a on a cluster built on my ami from source.

kylos101 commented 2 years ago

Hey @fredrikhgrelland , are you still having trouble with prebuilds freezing?

CC: @metcalfc

kylos101 commented 2 years ago

Hi @fredrikhgrelland , I am going to close this for now, but, if it's still a problem, let us know and we'll happily reopen. :pray:

CC: @metcalfc :eyes: