coder / code-server

VS Code in the browser
https://coder.com
MIT License
67.73k stars 5.56k forks source link

500 VS Code failed to load. timed out go home #2937

Closed rafipiccolo closed 2 years ago

rafipiccolo commented 3 years ago

after one day the server is always down, and the healthcheck says all is good... :)

curl xxx/healthz
{
  "status": "alive",
  "lastHeartbeat": 1616257100338
}

i get this message on the screen : 500 VS Code failed to load. timed out go home the errors says timeout, but the timeout i see is very fast (<1 sec) i have no error in chrome console except the error 500 on /

i have nothing in the logs, here they are :

[2021-03-19T11:42:19.841Z] info  code-server 3.9.1 e0203f2a36c9b7036fefa50eec6cf8fa36c5c015
[2021-03-19T11:42:19.842Z] info  Using user-data-dir ~/.local/share/code-server
[2021-03-19T11:42:19.859Z] info  Using config file ~/.config/code-server/config.yaml
[2021-03-19T11:42:19.859Z] info  HTTP server listening on http://0.0.0.0:8080 
[2021-03-19T11:42:19.859Z] info    - Authentication is disabled 
[2021-03-19T11:42:19.859Z] info    - Not serving HTTPS 

OS/Web Information

my docker-compose.yml

    vscode:
        image: codercom/code-server:latest
        restart: always
        command: --auth none
        user: "0:0"
        volumes:
            - ./xxx:/home/coder/xxx
            - ./xxx2:/home/coder/xxx2
        healthcheck:
            test: ['CMD', 'curl', '-fs', 'http://localhost:8080/healthz']
        labels:
            - traefik.enable=true
            - traefik.http.routers.xxx.rule=Host(`vscode.xxx.com`)
            - traefik.http.routers.xxx.tls.certresolver=le
            - traefik.http.routers.xxx.entrypoints=websecure
            - "traefik.http.middlewares.xxx.basicauth.users=xxx:xxx"
            - traefik.http.routers.xxx.middlewares=securityheaders,admin

Steps to Reproduce

  1. start server
  2. let it run, while doing intensive tasks on disk / cpu all day
  3. the next morning the project is always down, i get no feedback about a possible error.

Expected

working

Actual

500 VS Code failed to load. timed out go home

Screenshot

FireShot Capture 029

Notes

This issue can be reproduced in VS Code: No

oxy commented 3 years ago

I'm not entirely sure what's going on here - I don't run code-server in Kubernetes, but I've run code-server overnight, or with intensive tasks in the background, on my laptops with no issues.

Do you think it could be traefik?

rafipiccolo commented 3 years ago

Thanks for your reply. This traefik instance runs around 60 other projects simultaneously on the same machine for dev purpose with success. There is no downtime. All is fine.

Only this error 500 every morning. Every night I run a 6 hour big script to do all sort of maintenance. Maybe vscode ran out of memory during this process and can't recover ? Can I see debug logs ? How can I help. It's annoying to restart it everyday :) How come the healthz route says it's ok ? It's a standalone docker (no kubernetes)

rafipiccolo commented 3 years ago

I guess I will replace the health check to :

test: ['CMD', 'curl', '-fs', 'http://localhost:8080/']

Since the / returns a status code of 500 when it is dead it is a better health detector :p

Im still open to suggestions on debugging.

Or someone can close the issue.

code-asher commented 3 years ago

VS Code is spawned as a child process when its page is accessed and if the handshake doesn't complete within 10 seconds you get this "timed out" message. So possibly this is related to resources although I'm not familiar enough with how spawns and Node's built-in IPC work to really say.

Maybe it's taking longer than 10 seconds except you mentioned it takes less than one second so that's quite strange.

Running code-server with --log trace might reveal more information, at least it should show if there are messages being sent and received between code-server and VS Code.

We should improve this error message. It's awfully vague. Perhaps there are some more checks we can add to see if the spawn really worked or not as well. It also sounds like VS Code is dying at some point, maybe there's something related to that which causes the next spawn to fail.

Adding a VS Code check to the health endpoint sounds like it could be useful. Right now it only checks if code-server itself is up but not VS Code.

ghost commented 3 years ago

Hello, currently using code-server in termux and getting the same 500 error. Here is my trace:

code-server --log trace [2021-04-05T00:25:53.435Z] trace child:16721 got message {"message":{"type":"handshake","args":{"_":[],"bind-addr":"127.0.0.1:8080","auth":"password","password":"wontshowmypwdlol","config":"/data/data/com.termux/files/home/.config/code-server/config.yaml","log":"trace","user-data-dir":"/data/data/com.termux/files/home/.local/share/code-server","extensions-dir":"/data/data/com.termux/files/home/.local/share/code-server/extensions","verbose":true,"host":"127.0.0.1","port":8080,"proxy-domain":[],"usingEnvPassword":false,"usingEnvHashedPassword":false}}} [2021-04-05T00:25:53.453Z] info code-server 3.9.2 109d2ce3247869eaeab67aa7e5423503ec9eb859 [2021-04-05T00:25:53.454Z] info Using user-data-dir ~/.local/share/code-server [2021-04-05T00:25:53.455Z] trace Using extensions-dir ~/.local/share/code-server/extensions [2021-04-05T00:25:53.557Z] info Using config file ~/.config/code-server/config.yaml [2021-04-05T00:25:53.558Z] info HTTP server listening on http://127.0.0.1:8080 [2021-04-05T00:25:53.558Z] info - Authentication is enabled [2021-04-05T00:25:53.558Z] info - Using password from ~/.config/code-server/config.yaml [2021-04-05T00:25:53.559Z] info - Not serving HTTPS [2021-04-05T00:26:10.654Z] trace heartbeat [2021-04-05T00:26:10.741Z] debug forking vs code... [2021-04-05T00:26:13.381Z] error VS Code exited unexpectedly with code 0 [2021-04-05T00:26:16.420Z] debug forking vs code... [2021-04-05T00:26:18.833Z] error VS Code exited unexpectedly with code 0

Ideas?

jsjoeio commented 3 years ago

Hmm... not sure what that could be. @leKamikaze1 do you mind opening up a separate bug report issue and providing reproduction steps? Then we can look further into your issue.

ghost commented 3 years ago

Hmm... not sure what that could be. @leKamikaze1 do you mind opening up a separate bug report issue and providing reproduction steps? Then we can look further into your issue.

Sure

binacs commented 2 years ago

/cc

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no activity occurs in the next 5 days.

ChouBaoDxs commented 1 year ago

I have the same problem. Because I limit the docker container cpu to 100m. It worked when I set the cpu to 1000m.