Closed edigaryev closed 1 year ago
Great work @edigaryev - the worker has now been able to re-register. I did a quick test and everything seems to be working so far, but there is a recurring message around a 400 error:
orchard@mac % sudo launchctl load -w /Library/LaunchDaemons/org.cirruslabs.orchard.worker.plist
orchardi@mac % tail -f /tmp/orchard-worker.log
{"level":"info","ts":1687469693.550679,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469698.553354,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469703.5502238,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"warn","ts":1687469707.148829,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"info","ts":1687469708.546686,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469713.624796,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469718.548105,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469745.880018,"msg":"registered worker mac-M2GVQ20L75"}
{"level":"info","ts":1687469745.9966872,"msg":"syncing on-disk VMs..."}
{"level":"warn","ts":1687469746.322099,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469746.995243,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469747.868917,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469749.248595,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469751.3179488,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469754.977362,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"info","ts":1687469755.83937,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"info","ts":1687469756.161123,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"info","ts":1687469760.6640599,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"warn","ts":1687469761.893548,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"info","ts":1687469765.731574,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"info","ts":1687469770.6702971,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"warn","ts":1687469775.540767,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"info","ts":1687469775.749318,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"info","ts":1687469780.666985,"msg":"syncing 1 local VMs against 0 remote VMs..."}```
I've already deleted all VMs and restarted orchard. Any idea what could be causing this behaviour?
Also having issues with vnc
and ssh
:
forwarding 127.0.0.1:64247 -> ventura-xcode-new:5900...
no credentials specified or found, trying default admin:admin credentials...opening vnc://admin@127.0.0.1:64247...
failed to forward port: websocket.Dial wss://orchard.example.internal:443/v1/vms/ventura-xcode/port-forward?port=5900&wait=60: bad status
^C2023/06/22 22:31:15 context canceled
@ruimarinho can you check if the following ingress configuration works for you:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: orchard-ingress
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: orchard
port:
number: 6120
ingressClassName: nginx
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: orchard-ingress-grpc
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "GRPCS"
spec:
rules:
- http:
paths:
- path: /Controller
pathType: Prefix
backend:
service:
name: orchard
port:
number: 6120
ingressClassName: nginx
It most certainly will need to be adapter for your environment, but the main idea is that without nginx.ingress.kubernetes.io/backend-protocol: "GRPCS"
treatment for /Controller
path gRPC (which we use for port-forwarding) wouldn't work.
I've tried this on a local Kubernetes cluster and port-forwarding/SSH seem to work just fine.
@edigaryev I've tested your suggestion but I'm getting a 504 timeout:
2023/06/27 12:30:38 [error] 2282#2282: *83097928 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.10.10.100, server: orchard.example.internal, request: "POST /Controller/Watch HTTP/2.0", upstream: "grpcs://10.10.10.100:443", host: "orchard.example.internal:443"
I'm using 443 for the PORT
environment variable, but I've also tested with forward /Controller
to 6120 just in case the gRPC server would be listening to a different port (not that the code suggest this...) and then I got a connection refused.
Theoretically, it's being forwarded correctly because nginx is complaining about a grpcs://
upstream - now I just need to figure out why is it timing out. The ingress is behind an AWS NLB.
If you have any suspicion, let me know, otherwise I'll keep digging. Thanks!
@edigaryev after testing with a few more settings (grpc_connect_timeout
, grpc_read_timeout
, grpc_send_timeout
), the best outcome I've come across is getting a 499 status code instead of a 504 (gateway timeout). It seems like occasionally I was able to get a 502 too:
ingress-nginx-controller-6c48cbfb6f-2czfc controller 2023/06/28 11:51:03 [error] 1253#1253: *1194409 no connection data found for keepalive http2 connection while sending request to upstream, client: 10.10.10.100, server: orchard.example.internal, request: "POST /Controller/Watch HTTP/2.0", upstream: "grpcs://10.10.10.91:443", host: "orchard.example.internal:443"
After some investigation, it seems like nginx has an issue multiplexing HTTP/1.1 and gRPC, although I'm not entirely sure it's related with that here.
My suggestion would be to add a flag -- even a test build -- to run the gRPC server on a different port to see if that helps. There is nothing on the controller logs related to POST /Controller/Watch
.
Any other ideas you may have?
Below is the nginx configuration block generated for /Controller
:
In #86, Orchard was starting to create certificate-less contexts for Controllers that are using PKI-compatible certificates.
However, I've overlooked the fact the we also need to add the certificate-less support to the bootstrap tokens.
Resolves https://github.com/cirruslabs/orchard/issues/86.