coder / code-server

VS Code in the browser
https://coder.com
MIT License
67.6k stars 5.54k forks source link

Make timeout configurable #6936

Open samcofer opened 1 month ago

samcofer commented 1 month ago

Edit: Summary of timeouts to make configurable:

Original issue:

Is there an existing issue for this?

OS/Web Information

4.14.1 5c199629305a0b935b4388b7db549f77eae82b5a with Code 1.79.2

Steps to Reproduce

  1. Install the code-server version above, and launch it using this command: /usr/lib/rstudio-server/bin/code-server/bin/code-server --verbose --host=0.0.0.0 --port=40400 --disable-getting-started-override --disable-update-check --disable-telemetry

Expected

code server should start as expected on port 40400

Actual

The following error messages are spawned:

[2024-07-24T15:47:42.792Z] debug parent:3106566 spawned child process 3107105 [2024-07-24T15:47:52.794Z] error timed out [2024-07-24T15:47:52.795Z] debug parent:3106566 disposing {}

Included a strace of the startup: 108342_2024-08-05-14_18_30_24Jul24Codeserver.log

Logs

No response

Screenshot/Video

No response

Does this bug reproduce in native VS Code?

This cannot be tested in native VS Code

Does this bug reproduce in GitHub Codespaces?

This cannot be tested in GitHub Codespaces

Are you accessing code-server over a secure context?

Notes

This was previously working, and the only time-related change that I've been able to identify is a restriction on outbound traffic for most URLs. The only clue I have at the moment is that the way outbound traffic has been blocked doesn't immediately show as connection blocked/no route to host. curl commands to blocked addresses get stuck at Trying and essentially never time out. I've used code-server in a completely offline context, but I'm wondering if there's some unexpected bug or error happening because some outbound connectivity test being executed by a child process just never returns.

code-asher commented 1 month ago

Does this happen with the latest code-server as well? (4.91.1)

I think this is probably unrelated to the network, that timeout I believe is for the handshake over IPC with the child process after spawning the child process, and there are no network calls happening there. So either the spawn itself has failed or Node has failed to set up the IPC correctly between the child and parent processes, somehow, but honestly I have no idea how that would happen.

samcofer commented 1 month ago

This environment is disconnected from the internet, so getting new versions of software in there is understandably tough. I can take a crack at using latest and see if that helps with this scenario. I can see if I can devise a way to determine if the child process spawning is failing, maybe ulimits or app-armor or something. If I can ask, what is the child process here attempting to do? That might help me to narrow down the weirdness.

code-asher commented 1 month ago

Updating probably will not help anyway, I do not believe the spawning code has changed in a very long time. But it is probably worth double-checking just in case.

The child process is the "real" code-server process. The flow is:

  1. code-server checks for a CODE_SERVER_PARENT_PID environment variable
  2. Sees it does not exist, so it spawns the child (by forking) and sets CODE_SERVER_PARENT_PID
  3. The parent waits for the child to send a message and times out if it does not get a message in 10 seconds
  4. The child sees CODE_SERVER_PARENT_PID, so it sends a message to the parent
  5. The parent gets the message and sends one back
  6. The child gets the parent's message and moves on to actually start up code-server (set up the web server, import VS Code, etc)

The reason for this architecture is so that code-server can restart itself without having to spawn an entirely new process (so you can send it a USR2 signal for example to restart the inner process). Actually the original reason was to restart itself after an update but code-server no longer auto-updates so it is just the signal thing now.

code-asher commented 1 month ago

I wonder if the fork call fails and we are not properly catching the error. :thinking:

UnCor3 commented 3 weeks ago

@code-asher I believe that the issue is related to this

https://github.com/coder/code-server/blob/de65bfc9477f61bc22d0b1a23085d1f18bb25202/src/node/wrapper.ts#L9

I ran into this issue in ISH Shell (ios unix like terminal app) which i fixed by increasing the timeout to 100 seconds

Sees it does not exist, so it spawns the child (by forking) and sets CODE_SERVER_PARENT_PID

as you stated here this process on a relatively old hardware or in ISH Shell (tried on multiple devices ip6s, ip8+, ip11, ipse2ndgen) generally takes more than 10 seconds

In order to fix this issue my suggestion would be to introduce a new flag --timeout for example

It wasn't the only issue I've had when running this in ISH Shell

https://github.com/coder/code-server/blob/de65bfc9477f61bc22d0b1a23085d1f18bb25202/src/node/cli.ts#L667

Please do let me know if you want me to submit an issue for the issues listed above

My OS/Web Information

Web Browser: Chrome
Local OS: Windows 11
Remote OS: Alpine v3.18 (ISH Shell App)
Remote Architecture: x86 (tried with multiple devices ip6s, ip8+, ip11, ipse2ndgen)
Nodejs Version : 20.8.1-r0 (https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86/nodejs-current-20.8.1-r0.apk)

* indicates that the issue is most likely not related to code-server

code-asher commented 3 weeks ago

Interesting! I had not considered the possibility it was actually taking > 10 seconds because that seemed so long already.

Please do let me know if you want me to submit an issue for the issues listed above

I will rename this issue so we can use it for --timeout. We should definitely open a new issue for the config file being overwritten. Interesting that it does not throw EEXIST. Maybe it will work if we add { flag: "wx" } (https://nodejs.org/api/fs.html#file-system-flags).

code-asher commented 3 weeks ago

Ah as for the zlib issues, feel free to open an issue for that as well but I have no idea what we can do there.

I am not sure exactly what needs to change in the iOS docs to use Node v20 (is it the repositories step?) but happy to take a PR for that or just let me know what the right steps are and I can make the change.

UnCor3 commented 3 weeks ago

@code-asher I will submit a PR for updated IOS docs but currently trying to figure out how to run this outside of ish shell with a jailbroken device so that people can run 64 bit. As for the errors i believe it's all because of ish shell x86 emulation because later on i tried on a vm running 3.18 alpine x86 and it worked fine