Add support to call external hooks

cryptk commented 7 years ago

It would be nice if LXD had support for a few different areas to add in "hooks" to have extra logic run, One use could potentially be having a hook script fire off an alert into a monitoring system when a container move starts, and then another hook script to resolve that alert when the container move completes.

What I think my "best case scenario" for these hooks would be is adding a pre- and post- hook to every action, as well as one additional hook (the one I am really interested in having) which is one that happens during a live migration just before the CRIU snapshot is thawed on the destination server.

Each hook should be able to run both async as well as synchronously (a toggle value when the hook is defined, likely defaulting to async if unset).

My end goal would be to be able to run some logic (perhaps to move an ISCSI connection to the new host or something) before the live migration completes.

stgraber commented 7 years ago

This kind of thing was mentioned a few times and we've so far been pushing back pretty hard on it as such hook scripts introduce a lot of uncertainty and make it very hard for us to debug issues.

I'm pretty sure we won't be adding support for having LXD directly call a bunch of scripts from the host filesystem, but we may add something to the REST API which would let you get events that you can act on (through the events API). That's reasonably trivial for your async case, for the blocking/sync case, it's going to require more thoughts.

cryptk commented 7 years ago

Perhaps you could resolve those debugging related issue by having the hooks behind a config settings (maybe something like core.enable_hooks, defaults to false) and making it clear that any issues that are reported must be reproducible with that setting disabled. This would give an easy way for people to quickly disable all of their hooks and see if their problem still exists, allowing them to determine if their issue is caused by a problem with the hook or not (and obviously if the problem is caused by their hook, then that problem is theirs to own).

cryptk commented 7 years ago

At that point, the only issue relating to hooks that LXD would need to be concerned about would be if a properly configured hook just is not being executed at all... if that hook is executed, and it breaks something, then that is not your problem (unless for some reason LXD decided to ship a hook script themselves)

zerkms commented 7 years ago

I'm also missing pre-start/post-stop hooks in lxd: I use those in lxc to add a route to the container, so that it was visible from the local network.

rejsmont commented 6 years ago

Currently post-start and pre-stop hooks executed on the host are possible when using builtin dnsmasq. You can set dnsmasq.raw to contain something like dhcp-script=/usr/bin/whatever-hook-handler. man dnsmasq for more details of what can be done with these.

stgraber commented 6 years ago

Current state of things here is that the events API now allows for reacting to either background operation changes or more general container lifecycle events.

This combined with the regular LXD API enables a lot of external management operations which don't need blocking operations.

A potential extension to this would be a way for event listeners to register with LXD for blocking interactions with containers. Such event listeners would provide LXD with a list of events they want to handle and containers will then have a configuration key that specifies the list of listeners to contact.

That way a new listener will not hold all containers by default but when properly configured, LXD will refuse to start containers until the required event handler has connected to the API.

We'll be using this issue to track that more advanced part, though we don't have this on our roadmap at this time.

s3rj1k commented 6 years ago

@stgraber, I heavily rely on lxc.hooks, this is a must-have feature

mazerty commented 6 years ago

i'd also like this feature, for example to automatically snapshot a container on reboot

avsdev-cw commented 5 years ago

Also upvoting this feature request.

LXD proxy doesn't pass through the source IP address and so the workaround is to use iptables. However there are a number of issues with this, including the use of DHCP (resolve by making the ip address for the container static) and adding/removing the rule when the container is stopped/started (currently has to be done manually - alternative is to add rules to the /etc/network/interfaces file and have them added/removed by network interface up/down's which is not ideal)

stgraber commented 5 years ago

@avsdev-cw for the proxy not sending the address, we have support for the haproxy type headers (proxy protocol) whcih then does send you the address so long as your application supports reading the header.

avsdev-cw commented 5 years ago

@stgraber The issues still remains that either haproxy gets installed on the host (no thanks!) or that you need an iptables rule to route traffic into a container with haproxy and then you're back in the land of stop/start scripts being required.

One thing I have been looking at is tying a non-nat lxd bridge to an unmanaged nic on the server and then routing a public ip address through that nic into a container (no nat/iptables used outside the container as it will all be kernel level packet routing) which will be used as a proxy (potentially haproxy but more likely to be nginx) into a host-only bridge with the web hosts dangling off it (no port forwards needed as it will all be internal DNS based resolution at that point)

stgraber commented 5 years ago

@avsdev-cw you misunderstand what this feature is.

There is no need for haproxy anywhere. You just set proxy_protocol=true on the proxy device and then in the container the target application needs to understand that it will get one line of clear text metadata in all TCP connections which indicates the real IP of the client. There are readily available modules that handle this for most web servers, smtp servers, ...

https://www.haproxy.com/blog/haproxy/proxy-protocol/

avsdev-cw commented 5 years ago

@stgraber ahhhh yes I did. Thanks for the heads up, I've now managed to find that in the documentation!

geodb27 commented 5 years ago

I've fallen here because I was searching for a way to handle my peculiar use-case : I'd like an iptable rule to be set on the host when a container starts (to forward a given tcp port on the host to another one (80 or 8080 or whatever, depending on the service the container is build for), and this iptable rule to be dropped when the container is stopped. If I understood clearly what I read in this thread, there is no support, yet, for post-start and pre-stop hooks in lxd. However, @stgraber, you seem to say that this could be handled with the event's api. This could do the job, indeed, provided there is an example on how to use it for this peculiar case. That would be kind if someone could give such an example. Thanks a lot !

jgneff commented 3 years ago

I would like a way to have LXD run the following script when the lxdbr0 bridge is brought up:

#!/bin/bash
# LXD - Network configuration - Integration with systemd-resolved
# https://linuxcontainers.org/lxd/docs/master/networks.html
resolvectl dns lxdbr0 10.178.4.1
resolvectl domain lxdbr0 '~lxd'

The two commands are documented in the section "Integration with systemd-resolved" on the Network configuration page. The section explains, "This resolved configuration will persist as long as the bridge exists, so you must repeat this command each reboot and after LXD is restarted." For now, I'm running the script manually.

Libvirt, on the other hand, defines Hooks for specific system management that can run such a script automatically for its virbr0 bridge. For more details, see my comment in a systemd issue.

stgraber commented 3 years ago

Easiest may be a systemd unit like this one:

[Unit]
BindsTo=sys-subsystem-net-devices-lxdbr0.device
After=sys-subsystem-net-devices-lxdbr0.device systemd-resolved

[Service]
Type=oneshot
ExecStartPre=/usr/bin/resolvectl dns lxdbr0 10.178.4.1
ExecStart=/usr/bin/resolvectl domain lxdbr0 '~lxd'

[Install]
WantedBy=sys-subsystem-net-devices-lxdbr0.device

Put that in a file in /etc/systemd/system and then systemctl enable NAME.service and every time lxdbr0 gets created, the unit should start. It's also set to start after systemd-resolved to avoid that potential race too.

(Completely untested but based on something I've had to do for another service)

jgneff commented 3 years ago

(Completely untested but based on something I've had to do for another service)

Thank you, Stéphane. It works! I can't imagine how long this would have taken me to figure out on my own. I only had to change systemd-resolved to systemd-resolved.service in the After= line. I also changed the ExecStartPre= key to ExecStart= so the two commands would line up. The man systemd.service section on ExecStart= states, "When Type=oneshot is used, zero or more commands may be specified."

Below is my working file in /etc/systemd/system/dns-lxdbr0.service:

[Unit]
Description=Per-link DNS configuration for lxdbr0
BindsTo=sys-subsystem-net-devices-lxdbr0.device
After=sys-subsystem-net-devices-lxdbr0.device systemd-resolved.service

[Service]
Type=oneshot
ExecStart=/usr/bin/resolvectl dns lxdbr0 10.178.4.1
ExecStart=/usr/bin/resolvectl domain lxdbr0 '~lxd'

[Install]
WantedBy=sys-subsystem-net-devices-lxdbr0.device

Without the .service on systemd-resolved, I got the error:

$ journalctl -u dns-lxdbr0.service
-- Logs begin at Tue 2020-08-18 16:23:16 PDT, end at Fri 2021-06-11 08:43:31 PDT. --
Jun 10 21:30:44 tower systemd[1]: /etc/systemd/system/dns-lxdbr0.service:4:
   Failed to add dependency on systemd-resolved, ignoring: Invalid argument
   ︙

Below is the current status of the service:

$ systemctl status dns-lxdbr0.service
● dns-lxdbr0.service - Per-link DNS configuration for lxdbr0
     Loaded: loaded (/etc/systemd/system/dns-lxdbr0.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Thu 2021-06-10 22:17:08 PDT; 10h ago
    Process: 4001 ExecStart=/usr/bin/resolvectl dns lxdbr0 10.178.4.1 (code=exited, status=0/SUCCESS)
    Process: 4003 ExecStart=/usr/bin/resolvectl domain lxdbr0 ~lxd (code=exited, status=0/SUCCESS)
   Main PID: 4003 (code=exited, status=0/SUCCESS)

Jun 10 22:17:08 tower systemd[1]: Starting Per-link DNS configuration for lxdbr0...
Jun 10 22:17:08 tower systemd[1]: dns-lxdbr0.service: Succeeded.
Jun 10 22:17:08 tower systemd[1]: Finished Per-link DNS configuration for lxdbr0.

The per-interface DNS configuration is working great:

$ resolvectl status lxdbr0
Link 6 (lxdbr0)
      Current Scopes: DNS
DefaultRoute setting: no
       LLMNR setting: yes
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
  Current DNS Server: 10.178.4.1
         DNS Servers: 10.178.4.1
          DNS Domain: ~lxd
$ host focal.lxd
focal.lxd has address 10.178.4.228
focal.lxd has IPv6 address fd42:7be6:985d:a8ab:216:3eff:fe1a:1c72

stgraber commented 3 years ago

Excellent, I'm sure it will be useful to some other folks!

jgneff commented 3 years ago

Excellent, I'm sure it will be useful to some other folks!

It would be great to have this solution documented in the section "Integration with systemd-resolved" on the LXD Network configuration page. Before Ubuntu switched to systemd-resolved, it was easy to get this working by adding just one line to your local dnsmasq configuration in /etc/NetworkManager/dnsmasq.d/local.conf:

# Other non-public name servers and domain specs
server=/kvm/192.168.122.1
server=/lxd/10.178.4.1

Now you have to learn quite a bit about Systemd unit files. People are coming up with all sorts of creative solutions, but having the simplest solution right on the LXD Networks page would help a lot.

stgraber commented 3 years ago

Indeed. @tomponline want to update that part of the doc?

tomponline commented 3 years ago

Will do.

jgneff commented 3 years ago

And now it's broken. :frowning_face: The two updates on the latest/stable channel yesterday broke the IPv4 networking entirely for me. My containers no longer get assigned an IPv4 address, just an IPv6 one. They default to a localhost address (127.0.0.2) internally for IPv4. All the .lxd host DNS resolution that we just got working fails now, too.

I completely removed the LXD 4.15 snap package, configuration, and storage pool, and switched back to the 4.14/stable channel, and then it works perfectly again. I switched back again to the latest/stable channel, and now it fails again. The updates yesterday that are causing me trouble are:

$ snap changes lxd
ID   Status  Spawn                   Ready                   Summary
61   Done    yesterday at 19:17 PDT  yesterday at 19:18 PDT  Auto-refresh snap "lxd"
62   Done    yesterday at 22:15 PDT  yesterday at 22:15 PDT  Refresh snap "lxd"

Has anyone else seen this problem? Should I open a new issue? I'm posting here first because the recent comments in this issue just got everything working so well for me for the past week.

tomponline commented 3 years ago

@jgneff most likely its not systemd-networkd setup causing problem but instead this issue that was preventing dnsmasq from starting when using raw.dnsmasq network setting, see https://github.com/lxc/lxd/issues/8905

jgneff commented 3 years ago

see #8905

Thanks, @tomponline, but I have never set the raw.dnsmasq setting. In fact, the LXD dnsmasq process is up and running with:

dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr0 --dhcp-rapid-commit --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.203.206.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.203.206.2,10.203.206.254,1h --listen-address=fd42:561e:dfee:bb12::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd -g lxd

That dnsmasq.raw file is empty, yet still no IPv4 addresses, and host name resolution fails. Am I missing something in #8905? Attached are the dnsmasq entries from /var/log/syslog: syslog.txt.

tomponline commented 3 years ago

In that case check your system firewall its likely some external rules are inferring with lxds dhcp server, likely caused by it refreshing and reloading causing it to clear and readd the rules changing the order in relation to the externally added ones. Common candidates for this are docker.

jgneff commented 3 years ago

Okay, sudo ufw disable did the trick. Thank you. So it seems we need a section in the LXD documentation called, "Integration with UFW," to complement your recent update to "Integration with systemd-resolved." :smile:

I've been using a very simple UFW configuration along with LXD for years. Any hints on how to get them working together again after yesterday's update?

tomponline commented 3 years ago

Take a look at this https://discuss.linuxcontainers.org/t/lxd-bridge-doesnt-work-with-ipv4-and-ufw-with-nftables/10034/17 (as you can see its not a new issue) If you can send a pull request for that it would be great if not I'll add it to my to do list.

jgneff commented 3 years ago

Thank you very much, @tomponline. It's working perfectly again. As a bonus, the upgrade to dnsmasq 2.80 in the core20 base seems to have fixed the five-second delay on first host lookup (not sure, but maybe this bug).

tomponline commented 3 years ago

Glad that's working. This may have been the sudden cause of your issue, the snap core20 change subtly affected the cases where nftables would be used https://discuss.linuxcontainers.org/t/lxd-stopped-generating-firewall-rules-after-switch-to-core20/11367/9?u=tomp

gattytto commented 2 years ago

I need start and stop hooks to allocate and remove rDNSv6 records in a bind9 server when hosts use SLAAC to get their addresses.

stgraber commented 2 years ago

@gattytto did you look at LXD's network zones feature?

Basically you can create your forward and reverse DNS zones in LXD, then configure LXD to listen for DNS AXFR on some address/port and you can then have your upstream DNS server transfer that zone and either serve it directly or mangle/merge it with other data.

The one catch is that currently we expose the records so long as the instance exists but it would be easy for us to add a config key to only publish records for running instances.

https://linuxcontainers.org/lxd/docs/master/network-zones/

gattytto commented 2 years ago

thank you @stgraber I'll check it out

gattytto commented 2 years ago

@stgraber it looks that the mentioned feature is available for lxd managed bridges only, in my case is openvswitch-switch managed by the host OS using network/interfaces file.

stgraber commented 2 years ago

Ah yeah, in such cases LXD can't know for sure what addresses are in use as it doesn't know that EUI64 is available for configuration through SLAAC or if there's a DHCP server or what's going on...

tarruda commented 1 year ago

This kind of thing was mentioned a few times and we've so far been pushing back pretty hard on it as such hook scripts introduce a lot of uncertainty and make it very hard for us to debug issues.

I'm pretty sure we won't be adding support for having LXD directly call a bunch of scripts from the host filesystem, but we may add something to the REST API which would let you get events that you can act on (through the events API). That's reasonably trivial for your async case, for the blocking/sync case, it's going to require more thoughts.

As discussed at https://discuss.linuxcontainers.org/t/can-we-set-vm-pre-start-hooks/15004, it is easy to emulate post-shutdown hooks using the events API, so the main problem becomes emulating a pre-startup hook.

One way I imagine this could be done is by having a REST endpoint which can be used to inhibit instance startup, something like /1.0/instances/{name}/inhibit-startup. Here's how it could work:

Script makes a request to /1.0/instances/win10gpu/inhibit-startup. Nothing is sent as the body, the connection is kept open and LXD adds the connection to the list of startup inhibitors for that instance
When an instance is about to starts, LXD checks if there are any inhibitors. If so, it will send a response to each inhibit-startup request and waits for all inhibitors to close their connections or a 5 second timeout passes, whatever happens first
LXD force-closes any remaining inhibit connections and starts the instance.

This could be used to implement a pre-start hook without having LXD execute arbitrary code.

tomponline commented 1 year ago

For a prestart hook what is stopping you from performing the needed actions before calling lxc start though, or is it for convenience?

tarruda commented 1 year ago

For a prestart hook what is stopping you from performing the needed actions before calling lxc start though, or is it for convenience?

This is what I'm doing now, so it would be more for convenience. In my case I need to unload kernel modules so my wrapper script (which calls lxc start) has to be invoked with root permissions.

avsdev-cw commented 1 year ago

@tarruda the only problem I have with your idea is that it would require the inhibitor to keep a TCP socket open, waiting for the container to start. That said, another issue I have with it is that what if LXD starts containers BEFORE your inhibitor service is up, running and making the inhibitor connections? (obviously you would have to have an auto-start delay so the LXD service is up in time to even accept said connections as well).

It's much cleaner in my eyes to be able to add a per container start hook (maybe a config setting and only 1 hook script allowed). One of my use cases is I have an NTP container which has 2 passed-through GPS antennas. On a power cycle, the GPS antennas take bloody ages before they register as USB-Serial devices on the host and because I have other USB-Serial devices, the ports they land on is almost random. (I'm using unix-char type, don't even get me started on how irritating these things are to attach as usb type devices). Additionally, each has 2 ports, one control&monitor and one monitor only which can also land in different orders (they do nearly always land monitor/control, but its the "nearly" that gets ya). I only want the monitor ports going through and usually it requires me to manually run a few commands to find the ports, attach the devices (having removed the wrongly attached ones) and start the container. I would LOVE to get this auto-started because its a pain in the backside forgetting it.

edit: the manual commands can quite easily be scripted if I put my mind to it, and yes I could just have a start-up script for that container, but it still requires me to log in after boot and manually start. Again, this COULD be added as a system service, but just neater to keep it all in LXD control rather than have a dangling service.

tarruda commented 1 year ago

@avsdev-cw I would also prefer to have hooks supported directly, but as @stgraber mentioned this is against LXD design, so the inhibit endpoint is something that would allow users to emulate the same behavior without having LXD execute user scripts.

That said, another issue I have with it is that what if LXD starts containers BEFORE your inhibitor service is up, running and making the inhibitor connections? (obviously you would have to have an auto-start delay so the LXD service is up in time to even accept said connections as well).

One solution to this is have the LXD service depend on whatever other services/scripts are doing inhibit calls. Since LXD supports socket activation, this is easy to work with (LXD would only start after the first connection comes in).

This inhibit REST design would allow the creation of a generic daemon which is responsible for running all the user scripts. For example, it could execute /etc/lxd-hooks/{instance-name}/pre-startup.d/* before startup of instance-name and /etc/lxd-hooks/{instance-name}/post-shutdown.d/*.

So technically there could be only one process that is handling inhibit connections and running hooks outside of LXD.

avsdev-cw commented 1 year ago

I think that comment was made 5 years ago, and it was mostly in reference to not being able to support users. I think the issues around debugging have a potential solution in the following comments.

IIRC lxd sits on top of lxc, so if lxc already provides those hooks, would it not be simple for lxd to just passthrough the hooks (either exposing them or passing configuration to them)?

I like your idea of a separate daemon, until I have to remember to replicate all those scripts to the standby servers as well. And manage it's backup and configuration individually from LXD. (In my specific case above, a none-issue because I'd have to move the GPS cables, but I'm sure there are plenty of other start script requirements out there). It may be of interest, but iirc proxmox does provide a system for adding lifecycle hooks to containers and vm's - I don't use it anymore though so I don't know if the feature still exists and in what capacity. I still believe it should be an integrated feature of LXD rather than a bolt on from else where.

(yes I am aware I'd still have to migrate the scripts themselves, unless they were stored somewhere within the LXD config directory or container directory)

canonical / lxd

Add support to call external hooks #3391