docker-archive / deploykit

A toolkit for creating and managing declarative, self-healing infrastructure.
Apache License 2.0
2.25k stars 262 forks source link

Sync controllers should not destroy instances when Describe fails #799

Closed kaufers closed 6 years ago

kaufers commented 6 years ago

Is this related to #762

Appears that a RPC timeout is removing the members:

WARN[12-12|14:28:18] error describing group                   module=controller/ingress id=dtr err="Post http://h: net/http: request canceled (Client.Timeout exceeded while awaiting headers)" meta="{Identity:<nil> Name:d4ic-lb Tags:map[user:d4ic-user project:d4ic]}" stack="[github.com/docker/infrakit/pkg/controller/ingress/sync.go:120 github.com/docker/infrakit/pkg/controller/ingress/fsm.go:136 github.com/docker/infrakit/pkg/fsm/set.go:429 github.com/docker/infrakit/pkg/fsm/set.go:568 github.com/docker/infrakit/pkg/fsm/set.go:488]" fn=github.com/docker/infrakit/pkg/controller/ingress.(*managed).syncBackends
INFO[12-12|14:28:18] De-register backends                     module=controller/ingress instances="[10.176.21.142 10.176.21.152 10.176.21.155]" vhost=dtr L4=dtr meta="{Identity:<nil> Name:d4ic-lb Tags:map[project:d4ic user:d4ic-user]}" stack="[src/github.com/docker/infrakit/pkg/controller/enrollment/sync.go:169 github.com/docker/infrakit/pkg/controller/ingress/sync.go:157 github.com/docker/infrakit/pkg/controller/ingress/fsm.go:136 github.com/docker/infrakit/pkg/fsm/set.go:429 github.com/docker/infrakit/pkg/fsm/set.go:568 github.com/docker/infrakit/pkg/fsm/set.go:488]" fn=github.com/docker/infrakit/vendor/gopkg.in/inconshreveable/log15%2ev2.(Logger).Info-fm
EROR[12-12|14:28:18] Bad handshake. Is this plugin running?   module=run/manager lookup=stack endpoint="&{Name:stack Protocol:unix Address:/infrakit/plugins/stack}" stack="[github.com/docker/infrakit/pkg/run/manager/manager.go:273 github.com/docker/infrakit/pkg/run/manager/manager.go:251 github.com/docker/infrakit/cmd/infrakit/plugin/plugin.go:136 github.com/docker/infrakit/cmd/infrakit/plugin/plugin.go:193 github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:632 github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:722 github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:681 infrakit/main.go:190]" fn=github.com/docker/infrakit/pkg/run/manager.countMatches

Shouldn't the sync controllers not remove anything on RPC timeouts?