docker-archive / deploykit

A toolkit for creating and managing declarative, self-healing infrastructure.
Apache License 2.0
2.25k stars 262 forks source link

Enrollment controller destroying members that it cannot index #762

Closed kaufers closed 6 years ago

kaufers commented 6 years ago

We encountered an error that caused the group/managers group members to fail to index:

ime="2017-11-13T15:46:23Z" level=info msg="Describing authorized VMs for volume 33659167 with tags map[]" 
time="2017-11-13T15:46:24Z" level=info msg="1 authorized VMs for volume 33659167: [{44130467 <nil> map[] <nil>}]" 
EROR[11-13|15:46:24] cannot index entry                       module=controller/enrollment instance.Description="{ID:instance-1510587948 LogicalID:0xc42050c310 Tags:map[infrakit.attach:managers_dedicated_mgr1,managers_global infrakit.config_sha:37rffrnnfx55i6kkienugdfbt67e3has infrakit.group:managers logicalid:mgr1] Properties:<nil>}" err="template: file://localhost/infrakit_files/:1:23: executing \"file://localhost/infrakit_files/\" at <jsonDecode>: error calling jsonDecode: unexpected end of JSON input" stack="[github.com/docker/infrakit/pkg/controller/enrollment/set.go:19 github.com/docker/infrakit/pkg/controller/enrollment/set.go:55 github.com/docker/infrakit/pkg/controller/enrollment/sync.go:166 github.com/docker/infrakit/pkg/controller/enrollment/enroller.go:79 github.com/docker/infrakit/pkg/controller/poll.go:70]" fn=github.com/docker/infrakit/pkg/controller/enrollment.index
EROR[11-13|15:46:24] cannot index entry                       module=controller/enrollment instance.Description="{ID:instance-1510587957 LogicalID:0xc42050c5d0 Tags:map[infrakit-link:k9hj59d8rlc18xkg infrakit-link-created:2017-11-13t15:45:55z infrakit.attach:managers_dedicated_mgr3,managers_global infrakit.config_sha:37rffrnnfx55i6kkienugdfbt67e3has infrakit.group:managers swarm-id:i9pqoei11wa26xd8o9gu2cf3n infrakit-link-context:swarm::i9pqoei11wa26xd8o9gu2cf3n::manager logicalid:mgr3 name:instance-1510587957] Properties:<nil>}" err="template: file://localhost/infrakit_files/:1:23: executing \"file://localhost/infrakit_files/\" at <jsonDecode>: error calling jsonDecode: unexpected end of JSON input" stack="[github.com/docker/infrakit/pkg/controller/enrollment/set.go:19 github.com/docker/infrakit/pkg/controller/enrollment/set.go:55 github.com/docker/infrakit/pkg/controller/enrollment/sync.go:166 github.com/docker/infrakit/pkg/controller/enrollment/enroller.go:79 github.com/docker/infrakit/pkg/controller/poll.go:70]" fn=github.com/docker/infrakit/pkg/controller/enrollment.index
EROR[11-13|15:46:24] cannot index entry                       module=controller/enrollment instance.Description="{ID:instance-1510587974 LogicalID:0xc42050cbf0 Tags:map[infrakit-link-context:swarm::i9pqoei11wa26xd8o9gu2cf3n::manager infrakit.attach:managers_dedicated_mgr2,managers_global infrakit.config_sha:37rffrnnfx55i6kkienugdfbt67e3has infrakit.group:managers infrakit-link:9amuajatisku1ahy infrakit-link-created:2017-11-13t15:46:14z logicalid:mgr2 name:instance-1510587974 swarm-id:i9pqoei11wa26xd8o9gu2cf3n] Properties:<nil>}" err="template: file://localhost/infrakit_files/:1:23: executing \"file://localhost/infrakit_files/\" at <jsonDecode>: error calling jsonDecode: unexpected end of JSON input" stack="[github.com/docker/infrakit/pkg/controller/enrollment/set.go:19 github.com/docker/infrakit/pkg/controller/enrollment/set.go:55 github.com/docker/infrakit/pkg/controller/enrollment/sync.go:166 github.com/docker/infrakit/pkg/controller/enrollment/enroller.go:79 github.com/docker/infrakit/pkg/controller/poll.go:70]" fn=github.com/docker/infrakit/pkg/controller/enrollment.index
INFO[11-13|15:46:24] Computed delta                           module=controller/enrollment add=[] remove="[{ID:44130467 LogicalID:<nil> Tags:map[] Properties:<nil>}]" stack="[github.com/docker/infrakit/pkg/controller/enrollment/sync.go:168 github.com/docker/infrakit/pkg/controller/enrollment/enroller.go:79 github.com/docker/infrakit/pkg/controller/poll.go:70]" fn=github.com/docker/infrakit/pkg/controller/enrollment.(*enroller).sync
time="2017-11-13T15:46:24Z" level=info msg="Deauthorizing volume 33659167 from instance 44130467" 

The logic in the enrollment controller is computing a delta when it failed to get the index and issuing a destroy. Should this be changed to ignore the entry when it cannot determine the index?

kaufers commented 6 years ago

I'm seeing something similar in the ingress flow:

WARN[12-12|14:28:18] error describing group                   module=controller/ingress id=dtr err="Post http://h: net/http: request canceled (Client.Timeout exceeded while awaiting headers)" meta="{Identity:<nil> Name:d4ic-lb Tags:map[user:d4ic-user project:d4ic]}" stack="[github.com/docker/infrakit/pkg/controller/ingress/sync.go:120 github.com/docker/infrakit/pkg/controller/ingress/fsm.go:136 github.com/docker/infrakit/pkg/fsm/set.go:429 github.com/docker/infrakit/pkg/fsm/set.go:568 github.com/docker/infrakit/pkg/fsm/set.go:488]" fn=github.com/docker/infrakit/pkg/controller/ingress.(*managed).syncBackends
INFO[12-12|14:28:18] De-register backends                     module=controller/ingress instances="[10.176.21.142 10.176.21.152 10.176.21.155]" vhost=dtr L4=dtr meta="{Identity:<nil> Name:d4ic-lb Tags:map[project:d4ic user:d4ic-user]}" stack="[src/github.com/docker/infrakit/pkg/controller/enrollment/sync.go:169 github.com/docker/infrakit/pkg/controller/ingress/sync.go:157 github.com/docker/infrakit/pkg/controller/ingress/fsm.go:136 github.com/docker/infrakit/pkg/fsm/set.go:429 github.com/docker/infrakit/pkg/fsm/set.go:568 github.com/docker/infrakit/pkg/fsm/set.go:488]" fn=github.com/docker/infrakit/vendor/gopkg.in/inconshreveable/log15%2ev2.(Logger).Info-fm
EROR[12-12|14:28:18] Bad handshake. Is this plugin running?   module=run/manager lookup=stack endpoint="&{Name:stack Protocol:unix Address:/infrakit/plugins/stack}" stack="[github.com/docker/infrakit/pkg/run/manager/manager.go:273 github.com/docker/infrakit/pkg/run/manager/manager.go:251 github.com/docker/infrakit/cmd/infrakit/plugin/plugin.go:136 github.com/docker/infrakit/cmd/infrakit/plugin/plugin.go:193 github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:632 github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:722 github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:681 infrakit/main.go:190]" fn=github.com/docker/infrakit/pkg/run/manager.countMatches
chungers commented 6 years ago

We could just add the unindexable entries to a map of the id and count. We destroy the instance only when a threshold is reached. Will this work? We should clear the map entry on destroy

We can keep the map in memory. When failing over it will just start over again and worst case is another N-1 poll intervals repeating again.

kaufers commented 6 years ago

Fixed with policy support added in #826