Closed Bregor closed 7 years ago
We are also seeing this in our long-running cluster, which happens to be:
k8s version 1.4.7 on GKE Workflow v2.9.1 Helm 2.1.3
Yes, the problem appears in 2.9. 2.8 was not affected.
Thanks @Bregor that is helpful... So there are 2 builder releases in the meantime: https://github.com/deis/builder/releases/tag/v2.6.0 and https://github.com/deis/builder/releases/tag/v2.6.1
There aren't any changes that look obvious towards this behavior so wondering if the issue is something a bit more subtle such as the bump of the golang toolchain when we updated the docker-go-dev image in this project (https://github.com/deis/builder/commit/3fb031a34f3b21a936e5996696d1642a0b20788b)
No leads yet; going to build/deploy image locally with added debug logging to try to pinpoint...
From what we're finding, the error is here.
Perhaps this Go crypto/ssh issue is relevant, which would sound fair because of the basic liveness check performed by the server, which is just a basic TCP probe.
EDIT: Then again, that fix was merged in June 2015 so that fix would've always been there because Go 1.5 was released in Aug 2015. Still, there could be something here that leads to a bigger fish.
I think I'm onto something here. In server_test.go
, if I add this unit test that runs the server with no SSH config to intentionally cause it to panic and run make test
:
func TestReceiveWithNoConfig(t *testing.T) {
const testingServerAddr = "127.0.0.1:2246"
c := NewCircuit()
pushLock := NewInMemoryRepositoryLock(0)
// NOTE(bacongobbler): notice that we're not initializing the server with any SSH config.
runServer(nil, c, pushLock, testingServerAddr, time.Duration(0), t)
// Give server time to initialize.
time.Sleep(200 * time.Millisecond)
assert.Equal(t, c.State(), ClosedState, "circuit state")
// Connect to the server and issue env var set. This should return true.
_, err := ssh.Dial("tcp", testingServerAddr, clientConfig())
if err != nil {
t.Fatalf("Failed to connect client to local server: %s", err)
}
}
I get the following panic:
---> Listening on 127.0.0.1:2246
---> Accepting new connections.
---> Accepted connection.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5e3747]
goroutine 1091 [running]:
panic(0x8a3860, 0xc42000c0a0)
/usr/local/go/src/runtime/panic.go:500 +0x1ae
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.NewServerConn(0xaf3020, 0xc420484018, 0x0, 0x0, 0x0, 0x1, 0xc420280778, 0x4057da)
So wrong incantation, but at least we're in the right location to nailing down a reproducible test case against the OP's logs. From the panic stacktrace, it looks to be when initializing a basic handshake without having any valid config from the client. I think the DSAPrivateKey callout is a wild goose chase/default path.
For what it's worth, I've been running a local vagrant cluster on k8s v1.5.1 and Workflow v2.10.0 for over 19 hours without any restarts:
><> kd get po | grep builder
deis-builder-3170955059-k084s 1/1 Running 0 19h
Still trying to nail it down through a unit test though.
@bacongobbler try to build something. Especially something failing to build/start.
Building works fine, failing or not. Tried that, but thanks for the suggestion!
From @vdice's observations it seems to be something passive, because neither of us are actively killing the pod when it reboots.
From @vdice's observations it seems to be something passive, because neither of us are actively killing the pod when it reboots.
Correct, thank you @bacongobbler ; I should have mentioned that before. My GKE cluster (k8s 1.4.7) has the latest Workflow chart installed and is completely stock. Without even interacting with Deis/Workflow, (have not even registered) the builder pod restarts occur, usually multiple times an hour. Here's the number currently:
NAME READY STATUS RESTARTS AGE
deis-builder-2492834593-bh17g 1/1 Running 6 4h
We are having the same issue on cluster running on AWS (kube-aws):
Deis Workflow 2.9.1 Kube 1.4.6 Helm 2.1.0
Maybe this could be useful:
Events:
deis 2017-01-16 12:42:50 +0300 MSK 2017-01-12 09:25:20 +0300 MSK 30 deis-builder-762421820-k5ht5 Pod spec.containers{deis-builder} Warning Unhealthy {kubelet 10.83.8.200} Liveness probe failed: Get http://192.168.0.12:8092/healthz: EOF
deis 2017-01-16 12:42:51 +0300 MSK 2017-01-12 08:01:16 +0300 MSK 142 deis-builder-762421820-k5ht5 Pod spec.containers{deis-builder} Normal Pulled {kubelet 10.83.8.200} Container image "quay.io/deis/builder:v2.6.1" already present on machine
deis 2017-01-16 12:42:52 +0300 MSK 2017-01-16 12:42:52 +0300 MSK 1 deis-builder-762421820-k5ht5 Pod spec.containers{deis-builder} Normal Created {kubelet 10.83.8.200} Created container with docker id 2a3c05004c81; Security:[seccomp=unconfined]
deis 2017-01-16 12:42:53 +0300 MSK 2017-01-16 12:42:53 +0300 MSK 1 deis-builder-762421820-k5ht5 Pod spec.containers{deis-builder} Normal Started {kubelet 10.83.8.200} Started container with docker id 2a3c05004c81
Pods:
deis deis-builder-762421820-k5ht5 0/1 Error 140 4d 192.168.0.12 10.83.8.200
deis deis-builder-762421820-k5ht5 0/1 Running 141 4d 192.168.0.12 10.83.8.200
deis deis-builder-762421820-k5ht5 1/1 Running 141 4d 192.168.0.12 10.83.8.200
Controller log:
INFO [preview]: build preview-ebaf46d created
INFO [preview]: gazay deployed d94cc6b
INFO [preview]: This deployments overall timeout is 120s - batch timout is 120s and there are 1 batches to deploy with a total of 2 pods
192.168.0.12 "POST /v2/hooks/build/ HTTP/1.1" 200 25 "deis-builder"
192.168.0.12 "GET /v2/hooks/key/04:9b:f5:de:de:27:10:5a:b8:8d:ab:79:cd:17:e8:57 HTTP/1.1" 404 33 "deis-builder"
192.168.0.12 "GET /v2/hooks/key/27:c6:ad:f9:a6:4d:22:3f:18:b0:3b:df:81:1c:57:45 HTTP/1.1" 404 33 "deis-builder"
192.168.0.12 "GET /v2/hooks/key/dd:3b:b8:2e:85:04:06:e9:ab:ff:a8:0a:c0:04:6e:d6 HTTP/1.1" 404 33 "deis-builder"
192.168.0.12 "GET /v2/hooks/key/71:3a:b0:18:e2:6c:41:18:4e:56:1e:fd:d2:49:97:66 HTTP/1.1" 404 33 "deis-builder"
192.168.0.12 "GET /v2/hooks/key/b0:90:e5:dd:bf:1c:91:c0:83:dc:09:3a:53:7c:5d:07 HTTP/1.1" 404 33 "deis-builder"
192.168.0.12 "GET /v2/hooks/key/22:c8:a9:c3:01:a0:17:31:a5:43:f2:70:4a:1c:55:f6 HTTP/1.1" 404 33 "deis-builder"
May this 404s or deployment overall timeout be a purpose for failing builder's liveness-check?
Some addition info:
2017-01-17T11:11:58.616956833Z ERROR [ebay-mag-2]: (app::deploy): There was a problem while deploying v115 of ebay-mag-2-web. Additional information:
2017-01-17T11:11:58.617114079Z Readiness probe failed: Get http://192.168.48.12:5000/health: dial tcp 192.168.48.12:5000: getsockopt: connection refused
2017-01-17T11:11:58.617142643Z ERROR [ebay-mag-2]: (app::deploy): (app::deploy): There was a problem while deploying v115 of ebay-mag-2-web. Additional information:
2017-01-17T11:11:58.617157129Z Readiness probe failed: Get http://192.168.48.12:5000/health: dial tcp 192.168.48.12:5000: getsockopt: connection refused
2017-01-17T11:11:58.697508351Z ERROR:root:(app::deploy): (app::deploy): There was a problem while deploying v115 of ebay-mag-2-web. Additional information:
2017-01-17T11:11:58.697593251Z Readiness probe failed: Get http://192.168.48.12:5000/health: dial tcp 192.168.48.12:5000: getsockopt: connection refused
2017-01-17T11:11:58.697608908Z Traceback (most recent call last):
2017-01-17T11:11:58.697621238Z File "/app/scheduler/__init__.py", line 270, in deploy
2017-01-17T11:11:58.697633205Z namespace, name, image, entrypoint, command, **kwargs
2017-01-17T11:11:58.697644665Z File "/app/scheduler/resources/deployment.py", line 138, in update
2017-01-17T11:11:58.697656298Z self.wait_until_ready(namespace, name, **kwargs)
2017-01-17T11:11:58.697667225Z File "/app/scheduler/resources/deployment.py", line 353, in wait_until_ready
2017-01-17T11:11:58.697684935Z self.pod._handle_not_ready_pods(namespace, labels)
2017-01-17T11:11:58.697765951Z File "/app/scheduler/resources/pod.py", line 720, in _handle_not_ready_pods
2017-01-17T11:11:58.697781435Z raise KubeException(message)
2017-01-17T11:11:58.697795505Z scheduler.exceptions.KubeException: Readiness probe failed: Get http://192.168.48.12:5000/health: dial tcp 192.168.48.12:5000: getsockopt: connection refused
2017-01-17T11:11:58.697808171Z
2017-01-17T11:11:58.697818315Z The above exception was the direct cause of the following exception:
2017-01-17T11:11:58.697828868Z
2017-01-17T11:11:58.697841211Z Traceback (most recent call last):
2017-01-17T11:11:58.697860075Z File "/app/api/models/app.py", line 493, in deploy
2017-01-17T11:11:58.697880775Z async_run(tasks)
2017-01-17T11:11:58.697901071Z File "/app/api/utils.py", line 169, in async_run
2017-01-17T11:11:58.697927378Z raise error
2017-01-17T11:11:58.697939798Z File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
2017-01-17T11:11:58.697951311Z result = coro.throw(exc)
2017-01-17T11:11:58.697961688Z File "/app/api/utils.py", line 182, in async_task
2017-01-17T11:11:58.697972558Z yield from loop.run_in_executor(None, params)
2017-01-17T11:11:58.697983155Z File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
2017-01-17T11:11:58.697993875Z yield self # This tells Task to wait for completion.
2017-01-17T11:11:58.698005745Z File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
2017-01-17T11:11:58.698026335Z future.result()
2017-01-17T11:11:58.698046131Z File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
2017-01-17T11:11:58.698064605Z raise self._exception
2017-01-17T11:11:58.698075908Z File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
2017-01-17T11:11:58.698086948Z result = self.fn(*self.args, **self.kwargs)
2017-01-17T11:11:58.698097535Z File "/app/scheduler/__init__.py", line 276, in deploy
2017-01-17T11:11:58.698108305Z ) from e
2017-01-17T11:11:58.698118831Z scheduler.exceptions.KubeException: There was a problem while deploying v115 of ebay-mag-2-web. Additional information:
2017-01-17T11:11:58.698129888Z Readiness probe failed: Get http://192.168.48.12:5000/health: dial tcp 192.168.48.12:5000: getsockopt: connection refused
2017-01-17T11:11:58.698141051Z
2017-01-17T11:11:58.698150915Z The above exception was the direct cause of the following exception:
2017-01-17T11:11:58.698161851Z
2017-01-17T11:11:58.698172055Z Traceback (most recent call last):
2017-01-17T11:11:58.698182398Z File "/app/api/models/app.py", line 502, in deploy
2017-01-17T11:11:58.698195955Z self.deploy(release.previous(), force_deploy=True, rollback_on_failure=False)
2017-01-17T11:11:58.698235185Z File "/app/api/models/app.py", line 512, in deploy
2017-01-17T11:11:58.698248875Z raise ServiceUnavailable(err) from e
Controller fails from time to time. Builder's liveness probe is controller-related.
Just after builder restart:
$ git push deis master
2017-01-17T14:20:03.432674195Z Accepted connection.
2017-01-17T14:20:03.944641435Z Accepted connection.
2017-01-17T14:20:04.363952863Z ---> ---> ---> ---> panic: runtime error: slice bounds out of range
2017-01-17T14:20:04.364017975Z
2017-01-17T14:20:04.364028347Z goroutine 259 [running]:
2017-01-17T14:20:04.364035784Z panic(0x1170b00, 0xc4200180c0)
2017-01-17T14:20:04.364044873Z /usr/local/go/src/runtime/panic.go:500 +0x1a1
2017-01-17T14:20:04.364053685Z github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*dsaPrivateKey).Sign(0xc420358108, 0x1bbf2e0, 0xc4200155c0, 0xc42012da40, 0x14, 0x20, 0x0, 0x780, 0xc42012da40)
2017-01-17T14:20:04.364062346Z /go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/keys.go:379 +0x3bf
2017-01-17T14:20:04.364070577Z github.com/deis/builder/vendor/golang.org/x/crypto/ssh.signAndMarshal(0x1bc8860, 0xc420358108, 0x1bbf2e0, 0xc4200155c0, 0xc42012da40, 0x14, 0x20, 0x11030a0, 0x10caae3, 0x11a4180, ...)
2017-01-17T14:20:04.364077775Z /go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/server.go:157 +0x6d
2017-01-17T14:20:04.364086132Z github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*dhGroup).Server(0xc420106c20, 0x7f0b30777cd8, 0xc420366ea0, 0x1bbf2e0, 0xc4200155c0, 0xc4205b8720, 0x1bc8860, 0xc420358108, 0x40def4, 0x12f8db4, ...)
2017-01-17T14:20:04.364093352Z /go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/kex.go:172 +0x4fb
2017-01-17T14:20:04.364100217Z github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).server(0xc420612c30, 0x1bc8820, 0xc420106c20, 0xc42016a600, 0xc4205b8720, 0x1, 0x0, 0xe3)
2017-01-17T14:20:04.364107165Z /go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:366 +0x1db
2017-01-17T14:20:04.364115152Z github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).enterKeyExchange(0xc420612c30, 0xc4202d0000, 0xd9, 0xd9, 0x0, 0x0)
2017-01-17T14:20:04.364122009Z /go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:337 +0x378
2017-01-17T14:20:04.364128869Z github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).readOnePacket(0xc420612c30, 0x0, 0x0, 0x0, 0x0, 0x45c4c1)
2017-01-17T14:20:04.364135778Z /go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:166 +0xb4
2017-01-17T14:20:04.364142772Z github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).readLoop(0xc420612c30)
2017-01-17T14:20:04.364149315Z /go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:133 +0x2f
2017-01-17T14:20:04.364187652Z created by github.com/deis/builder/vendor/golang.org/x/crypto/ssh.newServerTransport
2017-01-17T14:20:04.364196565Z /go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:108 +0x216
Controller logs are empty at the moment, so this particular behaviour is just builder's
Controller issue is here: https://github.com/deis/controller/issues/1204
Another interesting detail: Over the weekend, I checked out the v2.5.5 tag of builder, which is the version released with Workflow v2.8.0. These versions are believed to be the last not hitting this issue. With the builder deployment updated to use this image, there indeed are instances of this issue, though seemingly more rare:
deis-builder-1590797075-a5w9f 1/1 Running 33 3d
Error:
$ kd logs -p deis-builder-1590797075-a5w9f
2017/01/17 12:20:21 Running in debug mode
2017/01/17 12:20:21 Starting health check server on port 8092
2017/01/17 12:20:21 Starting deleted app cleaner
2017/01/17 12:20:21 Starting SSH server on 0.0.0.0:2223
Parsed host key /var/run/secrets/deis/builder/ssh/ssh-host-rsa-key.
Parsed host key /var/run/secrets/deis/builder/ssh/ssh-host-dsa-key.
Parsed host key /var/run/secrets/deis/builder/ssh/ssh-host-ecdsa-key.
Listening on 0.0.0.0:2223
Accepting new connections.
Accepted connection.
[DEBUG] [DEBUG] [DEBUG] ---> ---> ---> [ERROR] Failed handshake: ssh: invalid packet length, packet too large
Accepted connection.
---> panic: runtime error: slice bounds out of range
goroutine 12604 [running]:
panic(0x1170100, 0xc4200160c0)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*dsaPrivateKey).Sign(0xc420026a98, 0x1bbe4e0, 0xc4200137a0, 0xc420130b60, 0x14, 0x20, 0x0, 0x780, 0xc420130b60)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/keys.go:379 +0x3bf
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.signAndMarshal(0x1bc7aa0, 0xc420026a98, 0x1bbe4e0, 0xc4200137a0, 0xc420130b60, 0x14, 0x20, 0x1102640, 0x10ca083, 0x11a3800, ...)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/server.go:157 +0x6d
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*dhGroup).Server(0xc4200ecc40, 0x7f4d821ee108, 0xc4204659e0, 0x1bbe4e0, 0xc4200137a0, 0xc420122420, 0x1bc7aa0, 0xc420026a98, 0x40def4, 0x12f84a8, ...)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/kex.go:172 +0x4fb
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).server(0xc4201174a0, 0x1bc7a60, 0xc4200ecc40, 0xc420118700, 0xc420122420, 0x1, 0x0, 0xe3)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:366 +0x1db
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).enterKeyExchange(0xc4201174a0, 0xc4200dc1c0, 0xd9, 0xd9, 0x0, 0x0)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:337 +0x378
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).readOnePacket(0xc4201174a0, 0x0, 0x0, 0xc42048c5a0, 0xc42001f7a8, 0x95b892)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:166 +0xb4
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).readLoop(0xc4201174a0)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:133 +0x2f
created by github.com/deis/builder/vendor/golang.org/x/crypto/ssh.newServerTransport
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:108 +0x216
+1 on this issue. Also witnessing the same behavior on k8s 1.4.7 builder image -- quay.io/deis/builder:v2.6.1
@gabrtv pointed me to this commit, made only 15 hours ago: https://github.com/golang/crypto/commit/b82246307bd525fde15c1df976318003716bca68
I'll see if compiling builder with go-tip fixes this. If so, then we'll have to find a way to apply this patch to Go 1.7.4 in deis/docker-go-dev.
EDIT: golang.org/x/crypto is maintained separately from Go stdlib. myb
If anyone's willing to try a fix, #464 looks promising.
re-opening as #464 didn't resolve this.
I believe we encounter the same issue:
Environment:
Fail ratio: 8 times in last 24 hour, ~20 times a day avg from last 7days.
Log:
Accepted connection.
---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> ---> panic: runtime error: slice bounds out of range
goroutine 2651 [running]:
panic(0x1170b00, 0xc4200180c0)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*dsaPrivateKey).Sign(0xc42015c350, 0x1bbf2e0, 0xc420015740, 0xc420393300, 0x14, 0x20, 0x0, 0x780, 0xc420393300)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/keys.go:379 +0x3bf
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.signAndMarshal(0x1bc8860, 0xc42015c350, 0x1bbf2e0, 0xc420015740, 0xc420393300, 0x14, 0x20, 0xc42044d8eb, 0x12e44b7, 0x8, ...)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/server.go:157 +0x6d
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*dhGroup).Server(0xc4200f8c20, 0x7f18f3edd968, 0xc4202146c0, 0x1bbf2e0, 0xc420015740, 0xc4203d9080, 0x1bc8860, 0xc42015c350, 0x40def4, 0x12f8db4, ...)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/kex.go:172 +0x4fb
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).server(0xc42068a000, 0x1bc8820, 0xc4200f8c20, 0xc4201edb80, 0xc4203d9080, 0x1, 0x0, 0xe3)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:366 +0x1db
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).enterKeyExchange(0xc42068a000, 0xc4204a01c0, 0xd9, 0xd9, 0x0, 0x0)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:337 +0x378
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).readOnePacket(0xc42068a000, 0x0, 0x0, 0xc4203d8240, 0x7096dd, 0x180001)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:166 +0xb4
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).readLoop(0xc42068a000)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:133 +0x2f
created by github.com/deis/builder/vendor/golang.org/x/crypto/ssh.newServerTransport
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:108 +0x216
I also got same on Workflow-2.11.0 / K8s v1.5.2+coreos.1 / Azure. Reproduced randomly.
Accepted connection.
Starting ssh authentication
Channel type: session
Received request of type auth-agent-req@openssh.com
Key='LC_ALL', Value=''
receiving git repo name: eclipse-che.git, operation: git-receive-pack, fingerprint: 54:0d:c3:3d:43:46:d8:1d:db:0e:ff:f5:02:1c:36:10, user: monaka
creating repo directory /home/git/eclipse-che.git
writing pre-receive hook under /home/git/eclipse-che.git
git-shell -c git-receive-pack 'eclipse-che.git'
Waiting for git-receive to run.
Waiting for deploy.
Accepted connection.
---> ---> ---> ---> ---> ---> ---> ---> ---> ---> panic: runtime error: slice bounds out of range
goroutine 5062 [running]:
panic(0x11ac4a0, 0xc4200180c0)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*dsaPrivateKey).Sign(0xc420316470, 0x1c22840, 0xc4200156e0, 0xc420708e80, 0x14, 0x20, 0x0, 0x7c0, 0xc420708e80)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/keys.go:457 +0x3bf
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.signAndMarshal(0x1c2c000, 0xc420316470, 0x1c22840, 0xc4200156e0, 0xc420708e80, 0x14, 0x20, 0xc4205622eb, 0x132695d, 0x8, ...)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/server.go:159 +0x6d
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*dhGroup).Server(0xc4200d1720, 0x7f1064176b80, 0xc42037c200, 0x1c22840, 0xc4200156e0, 0xc4201704e0, 0x1c2c000, 0xc420316470, 0x40df34, 0x133b4ad, ...)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/kex.go:188 +0x511
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).server(0xc420540a00, 0x1c2bfc0, 0xc4200d1720, 0xc42009c380, 0xc4201704e0, 0x1, 0x0, 0x2)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:537 +0x1db
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).enterKeyExchange(0xc420540a00, 0xc4204c41c0, 0xd9, 0xd9, 0xc4203e3e01, 0xc4206e77a0)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:502 +0x37d
github.com/deis/builder/vendor/golang.org/x/crypto/ssh.(*handshakeTransport).kexLoop(0xc420540a00)
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:271 +0x1e9
created by github.com/deis/builder/vendor/golang.org/x/crypto/ssh.newServerTransport
/go/src/github.com/deis/builder/vendor/golang.org/x/crypto/ssh/handshake.go:121 +0x227
Alas, it doesn't appear that any golang.org/x/crypto bumps have completely ameliorated this issue; bumping to next milestone.
Can reproduce as well; a huge problem for us at the moment 😞
Closed by https://github.com/deis/builder/pull/493
Included in Builder v2.9.0 release. You can update an existing install/helm release via
helm upgrade <release> workflow/workflow --set builder.docker_tag=v2.9.0 (-f values.yaml)
Note you may want to add --version <particular Workflow version>
if necessary.
Or just patch the builder deploy via
kubectl -n deis set image deployment deis-builder deis-builder=quay.io/deis/builder:v2.9.0
I think you also need to upgrade to dockerbuilder v2.7.1 or things will break.
The helm upgrade
assumes v2.12.0 which has the latest version of dockerbuilder so yes.
Btw. I just did some debugging in golang.org/x/crypto/ssh and it appears the above mentioned bug occurs with 2048 Bit DSA keys, while the common key size for DSA is 1024 Bit. The error seems to be that the code hardcodes a subgroup size of 160 Bit (20 Bytes), but DSA 2048 Bit uses either 224 Bit or 256 Bit subgroups (my test key generated with openssl had a subgroup size of 256 Bit / 32 Byte).
I'll open an upstream issue and reference it back here.
Update: Looks like @vdice already reported the problem in golang/go#19424.
Based on the above it's safe to assume that the crashes happened whenever someone with a DSA key of 2048 Bit or more connected to the builder (which could also be bots/scanners and not actual users).
Update: It appears the root of the problem stems from the genPrivateKey
template function of sprig which is used in the deis builder helm templates to generate the host keys and uses dsa.L2048N256
to generate a 2048 Bit DSA key with a 256 Bit subgroup instead of dsa.L1024N160
for a 1024 Bit key with a 160 Bit subgroup, which ist the only key size allowed by ssh-keygen -t dsa
and the openssh server.
We completely removed DSA keys from the builder some time ago as they're being discontinued, so this effectively has been fixed.
@bacongobbler sure, this should be closed. Thanks for reminding.
Environment:
Fail ratio:
Log: