docker-archive / deploykit

A toolkit for creating and managing declarative, self-healing infrastructure.
Apache License 2.0
2.25k stars 262 forks source link

InfraKit panic when directory tree isn't present #308

Closed thebsdbox closed 7 years ago

thebsdbox commented 7 years ago

If the pwd/.infrakit/plugins/ location for plugins to create their UNIX socket file does not exist, then the plugin will panic. Example below, the plugins need to create the directory structure should it not exist, or exit safely with a warning message.

Having the plugins create the directory would allow more control over over permissions of the .infrakit directory.

dan@localhost ~ $ ./infrakit-instance-file &
[1] 2615
dan@localhost ~ $ ERRO[0000] listen unix /home/dan/.infrakit/plugins/instance-file: bind: no such file or directory 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x46a122]

goroutine 1 [running]:
panic(0x7276e0, 0xc42000c090)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/docker/infrakit/pkg/cli.RunPlugin(0x784f99, 0xd, 0x739440, 0xc42000d2a0)
    /root/go/src/github.com/docker/infrakit/pkg/cli/serverutil.go:18 +0x102
main.main.func1(0xc42007cb40, 0x94dfd8, 0x0, 0x0)
    /root/go/src/github.com/docker/infrakit/pkg/example/instance/file/main.go:23 +0xd1
github.com/docker/infrakit/vendor/github.com/spf13/cobra.(*Command).execute(0xc42007cb40, 0xc42000c230, 0x0, 0x0, 0xc42007cb40, 0xc42000c230)
    /root/go/src/github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:636 +0x443
github.com/docker/infrakit/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc42007cb40, 0xc420045f28, 0x77f8e0, 0xc42007cd80)
    /root/go/src/github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:722 +0x367
github.com/docker/infrakit/vendor/github.com/spf13/cobra.(*Command).Execute(0xc42007cb40, 0xc420045f20, 0x1)
    /root/go/src/github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:681 +0x2b
main.main()
    /root/go/src/github.com/docker/infrakit/pkg/example/instance/file/main.go:28 +0x2cc

[1]+  Exit 2                  ./infrakit-instance-file
dan@localhost ~ $ mkdir -p `pwd`/.infrakit/plugins/
dan@localhost ~ $ ./infrakit-instance-file &
[1] 2621
dan@localhost ~ $ INFO[0000] Listening at: /home/dan/.infrakit/plugins/instance-file 
FrenchBen commented 7 years ago

@thebsdbox If you look at the tutorial, you'll see that the first plugin that should be started is the group plugin, which sets up the directories: https://github.com/docker/infrakit/blob/master/docs/tutorial.md

$ ./infrakit-group-default &
[1] 5756
INFO[0000] Listening at: /Users/frenchben/.infrakit/plugins/group
$
thebsdbox commented 7 years ago

Well it is inferred, it isn't listed as a requirement for InfraKit plugins to be started in a particular order. Also some of the examples make use of the plugins directly which will result in the same behaviour.

I've also observed another issue with the Group plugin panicking, and this causes the group plugin to panic on restarting until the existing socket file is deleted from the plugins directory.

DEBU[0014] Sending request POST / HTTP/1.1
Content-Type: application/json

{"method":"Instance.Provision","params":[{"Spec":{"Properties":{"Note":"Instance properties version 2.0"},"Tags":{"infrakit.config_sha":"6iqedL9ysy3pgw0Q-wYiaoZdt6E=","infrakit.group":"cattle"},"Init":"","LogicalID":null,"Attachments":null}}],"id":8326915720006747402} 
DEBU[0014] Received response HTTP/1.1 200 OK
Content-Length: 74
Content-Type: application/json
Date: Sat Nov 26 19:08:34 2016
Server: InfraKit

{"result": {"Descriptions": []}, "error": null, "id": 8326915720006747402} 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8261d]

goroutine 41 [running]:
panic(0x3287c0, 0xc420010090)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/docker/infrakit/pkg/plugin/group.(*scaledGroup).CreateOne(0xc42010a700, 0x0)
    /root/go/src/github.com/docker/infrakit/pkg/plugin/group/scaled.go:87 +0x47d
github.com/docker/infrakit/pkg/plugin/group.(*scaler).converge.func2(0xc4201332e0, 0xc42010fc50)
    /root/go/src/github.com/docker/infrakit/pkg/plugin/group/scaler.go:243 +0x64
created by github.com/docker/infrakit/pkg/plugin/group.(*scaler).converge
    /root/go/src/github.com/docker/infrakit/pkg/plugin/group/scaler.go:244 +0x631

The result:{Descriptions:[] was empty as the instances were still being provisioned...

FrenchBen commented 7 years ago

Perhaps the documentation should be more precise. Using the infrakit cli is more of a client that acts upon the flavor/instance/group/plugin. If the plugin is not found, it simply shows the associated error:

$ ./infrakit instance describe
FATA[0000] Plugin not found:

In the first example, you see that the following plugins were previously started:

This means that you can act upon any of the above flavor/instance/group.

I've never had to clean-up any plugin upon restart or after killing the process (I've even done pgrep kills) and has never triggered any panic on my end (unless my plugin was at fault)

For example:

$ ./infrakit plugin ls
NAME                    LISTEN
group                   /Users/frenchben/.infrakit/plugins/group
$ pgrep infrakit | xargs kill
[1]  + 5756 done       ./infrakit-group-default
$ ./infrakit plugin ls
NAME                    LISTEN
$ ./infrakit-group-default&
[1] 16318
INFO[0000] Listening at: /Users/frenchben/.infrakit/plugins/group

Can you provide more details as to what plugin you're using, what configuration you've commit to the group (old watch), etc?

thebsdbox commented 7 years ago

In the event of a system crash or a plugin crashing for program reasons (such as a panic) then the socket will be left behind. In your example you're using the default kill, which will abort the program correctly and tidy up any remaining sockets. If you try your example with kill -9, or force your system to spontaneously reboot halt -f then you'll be left with sockets and your plugins will exhibit panics on restart.

linsun commented 7 years ago

I also hit this yesterday.... had thought it was a port conflict and reboot my laptop didn't help obviously.... I probably forced shutdown my laptop at one point when my keyboard was not working few days ago. :-(

I would also be great to improve the error msg to be something more intuitive than below. It would help to indicate if the plugin should not be running, user can manually remove the file (e.g. /home/dan/.infrakit/plugins/instance-file) to clear up the socket.

"ERRO[0000] listen unix /home/dan/.infrakit/plugins/instance-file: bind: no such file or directory panic: runtime error: invalid memory address or nil pointer dereference"

linsun commented 7 years ago

To be precise, the error I hit is:

linsun at linsun in ~/go/src/github.com/docker/infrakit on git:master ● [16:32:23]

→ build/infrakit-group-default ERRO[0000] listen unix /Users/linsun/.infrakit/plugins/group: bind: address already in use panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x7ba72]

goroutine 1 [running]: panic(0x3281c0, 0xc42000c1c0) /usr/local/Cellar/go/1.7.3/libexec/src/runtime/panic.go:500 +0x1a1 github.com/docker/infrakit/pkg/cli.RunPlugin(0x384cd2, 0x5, 0x505d00, 0xc42007d370) /Users/linsun/go/src/github.com/docker/infrakit/pkg/cli/serverutil.go:18 +0x102 main.main.func1(0xc4200a8d80, 0x544388, 0x0, 0x0, 0x0, 0x0) /Users/linsun/go/src/github.com/docker/infrakit/cmd/group/main.go:54 +0x225 github.com/docker/infrakit/vendor/github.com/spf13/cobra.(Command).execute(0xc4200a8d80, 0xc42007c140, 0x0, 0x0, 0xc4200a8d80, 0xc42007c140) /Users/linsun/go/src/github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:632 +0x23e github.com/docker/infrakit/vendor/github.com/spf13/cobra.(Command).ExecuteC(0xc4200a8d80, 0xc420053f20, 0x381f60, 0xc4200a8fc0) /Users/linsun/go/src/github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:722 +0x367 github.com/docker/infrakit/vendor/github.com/spf13/cobra.(*Command).Execute(0xc4200a8d80, 0xc420053f18, 0x1) /Users/linsun/go/src/github.com/docker/infrakit/vendor/github.com/spf13/cobra/command.go:681 +0x2b main.main() /Users/linsun/go/src/github.com/docker/infrakit/cmd/group/main.go:61 +0x2b9

I workaround it by removing this: /Users/linsun/.infrakit/plugins/group

chungers commented 7 years ago

There are a couple of issues associated to this:

  1. Panic when the plugin discovery directory isn't present.
  2. Socket files get left behind when host crashes -- this causes problems when plugin starts up again (bind: address already in use) and client connection failures.

Item 1. Is easy to fix -- we can just create the directory if it isn't there. This really isn't a great fix IMHO because there's no guarantee that all the plugins start up with the same path - so you can end up creating new directories and still have connection problem.

For Item 2, garbage collecting orphaned socket files when client tries to connect is easy enough, but it's too late since at this point the operator needs to restart the plugin again. This manual step is ok during development but not so nice in operation.

Because we are working to improve management of plugins (e.g. activating them on demand #284 and #328), we will also be mindful on how the implementation will solve this issues:

So hopefully the issues reported here will be non-issues when we have the plugin management implemented. However, in the short-term I will look at better handling of these errors.