Open karlkfi opened 8 years ago
Isn't this "Exitcode:2" just your binary terminating with 2?
It should be, but it's not. See the event: Error scheduling: No suitable offers for pod/task
.
That message is normal when a pod comes in without an available offer. It's not nice, but has no influence on scheduling. I believe that the pod has been started when you see "Exitcode:2".
I tried again without a working cassandra. Pods come up and terminate, again and again, giving Exit Code 2
:
lastState:
terminated:
containerID: docker://009ca04414623507c155cfb88f1d560620d985641f797ccbb6dfa389c4b23a8a
exitCode: 2
finishedAt: 2015-11-23T15:53:07Z
startedAt: 2015-11-23T15:53:07Z
The symptoms have nothing to do with scheduling. The task fails (maybe because of other issues, DNS? Cassandra connection broken?)
Here is a log of one of the failing containers:
core@ip-10-0-2-61 ~ $ docker logs 1d96a7742b0a
time="2015-11-23T15:54:47Z" level=info msg="Flags: &{address:0xc20802ab30 cassandraAddr:0xc20802aaf0 cassandraRepl:0xc20802ab10}"
time="2015-11-23T15:54:47Z" level=info msg="Creating cql session"
time="2015-11-23T15:54:47Z" level=error msg="Creating CQL Session: no connections were made when creating the session"
time="2015-11-23T15:54:47Z" level=info msg="Creating keyspace oinker (replication_factor: 3)"
time="2015-11-23T15:54:47Z" level=error msg="Cannot finalize cql session, not initialized"
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x4badfd]
goroutine 1 [running]:
github.com/gocql/gocql.(*policyConnPool).Pick(0x0, 0xc2080348f0, 0x3)
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/gocql/gocql/connectionpool.go:175 +0x13d
github.com/gocql/gocql.(*Session).executeQuery(0xc208060000, 0xc2080348f0, 0x0)
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/gocql/gocql/session.go:221 +0xf8
github.com/gocql/gocql.(*Query).Iter(0xc2080348f0, 0xc2080348f0)
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/gocql/gocql/session.go:707 +0xce
github.com/gocql/gocql.(*Query).Exec(0xc2080348f0, 0x0, 0x0)
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/gocql/gocql/session.go:689 +0x3a
github.com/karlkfi/oinker-go/service.(*CQLOinkRepo).Initialize(0xc208084360)
/go/src/github.com/karlkfi/oinker-go/service/cql_oink_repo.go:45 +0x3a7
github.com/karlkfi/inject.(*definition).Resolve(0xc20800bf20, 0x7f4071e8b578, 0xc208032040, 0x0, 0x0, 0x0)
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/karlkfi/inject/definition.go:52 +0x1b6
github.com/karlkfi/inject.(*graph).ResolveByAssignableType(0xc208032040, 0x7f4071e86b28, 0x7d25c0, 0x0, 0x0, 0x0)
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/karlkfi/inject/graph.go:81 +0x1d9
github.com/karlkfi/inject.ExtractAssignable(0x7f4071e8b578, 0xc208032040, 0x703ba0, 0xc20802ab40, 0x0, 0x0, 0x0)
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/karlkfi/inject/extract.go:39 +0x27e
main.main()
/go/src/github.com/karlkfi/oinker-go/main.go:32 +0x2ab
goroutine 5 [syscall]:
os/signal.loop()
/usr/lib/go/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
/usr/lib/go/src/os/signal/signal_unix.go:27 +0x35
goroutine 8 [runnable]:
github.com/gocql/gocql.(*hostConnPool).fillingStopped(0xc20805e000)
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/gocql/gocql/connectionpool.go:384
created by github.com/gocql/gocql.(*hostConnPool).fill
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/gocql/gocql/connectionpool.go:338 +0x226
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/usr/lib/go/src/runtime/asm_amd64.s:2232 +0x1
goroutine 9 [runnable]:
github.com/gocql/gocql.(*hostConnPool).drain(0xc20805e000)
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/gocql/gocql/connectionpool.go:457
created by github.com/gocql/gocql.(*hostConnPool).Close
/go/src/github.com/karlkfi/oinker-go/Godeps/_workspace/src/github.com/gocql/gocql/connectionpool.go:286 +0xa0
It can be that the issue of bad scheduling is actually related to https://github.com/mesosphere/kubernetes-mesos/issues/636 which is fixed now in latest 0.7 RC packages.
Removing from the milestone for the moment and reassigning to @karlkfi for further tracking.
Downgrading priority, since it's just an Oinker bug.
Deploy k8s, etcd (may not be critical to duplication) & cassandra:
Deploy OinkerGo:
Check the status of the pods:
Wait a few seconds and try again:
Events on one of the failing pods:
Eventually the pods all get scheduled, but it can take 0-12 restarts.
(I think this
cannot join network of a non running container
is a different bug.)