NetworkConfiguration / dhcpcd

DHCP / IPv4LL / IPv6RA / DHCPv6 client.
https://roy.marples.name/projects/dhcpcd
BSD 2-Clause "Simplified" License
342 stars 111 forks source link

Can't start separate instance, manager is always used contrary to docs #241

Open DanielG opened 1 year ago

DanielG commented 1 year ago

Hi,

the dhcpcd(8) man page says

If a single interface is given then dhcpcd only works for that interface and runs as a separate instance to other dhcpcd processes.

This is not true, if a master instance is already running a command will be sent to it rather than spawning a new seperate instance. This prevents important use cases, such as in my case, running a new dhcpcd instance inside a VRF using something like ip vrf exec vrf-mgmt dhcpcd eth1.

AFACT there's no way to change the path to the control socket(s) used, so there is no workaround to this short of recompiling dhcpcd or doing some Linux mount-namespace hacks. I belive this should be considered an issue in it's own right.

A singleshot (forground) dhcpcd -B ethX would also be very convenient for debugging dhcp problems without having to dig through logs. Dhclient used to fill that need but since it's EOL that's not going to be an option for much longer.

Thanks, --Daniel

rsmarples commented 1 year ago

Have you tried using the -T option? That might work with the master instance running and you --test with a specific interface.

DanielG commented 1 year ago

Indeed -T doesn't connect to the manager, but it segfaults instead :)

$ sudo gdb --args dhcpcd -T wlan1
GNU gdb (Debian 13.1-3) 13.1
Reading symbols from dhcpcd...
Reading symbols from /usr/lib/debug/.build-id/38/87f735a4f7602e3d666eeae7d6ca10379d8277.debug...
(gdb) r
Starting program: /usr/sbin/dhcpcd -T wlan1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
dhcpcd-10.0.2 starting
[Detaching after fork from child process 1131240]
[Detaching after fork from child process 1131241]
DUID X:X:X:X
dhcp6_openudp: Address already in use
ps_inet_listenin6: Address already in use
ps_root_recvmsg: Address already in use
wlan1: DHCP6 proxy fe80:: exited unexpectedly from PID 1131242, code=1
wlan1: IAID 52:05:ed:69
wlan1: soliciting a DHCP lease
wlan1: offered 192.168.0.207 from 192.168.0.1
interface='wlan1'
pid='1131237'
protocol='dhcp'
reason='TEST'
skip_hooks='resolv.conf'
ifcarrier='up'
ifflags='69699'
ifmetric='3005'
ifmtu='1500'
ifssid='ZTE_5EU327'
ifwireless='1'
new_broadcast_address='192.168.0.255'
new_dhcp_lease_time='86400'
new_dhcp_message_type='2'
new_dhcp_rebinding_time='75600'
new_dhcp_renewal_time='43200'
new_dhcp_server_identifier='192.168.0.1'
new_interface_mtu='1500'
new_ip_address='192.168.0.207'
new_network_number='192.168.0.0'
new_routers='192.168.0.1'
new_subnet_cidr='24'
new_subnet_mask='255.255.255.0'

Program received signal SIGSEGV, Segmentation fault.
0x000055555557e9f2 in dhcp_handledhcp (ifp=0x5555555d2400, bootp=0x7ffffffedd9c, bootp_len=<optimized out>, from=<optimized out>) at ./src/dhcp.c:3317
3317    ./src/dhcp.c: No such file or directory.
(gdb) bt
#0  0x000055555557e9f2 in dhcp_handledhcp (ifp=0x5555555d2400, bootp=0x7ffffffedd9c, bootp_len=<optimized out>, from=<optimized out>) at ./src/dhcp.c:3317
#1  0x000055555557ea5c in dhcp_handlebootp (ifp=<optimized out>, bootp=<optimized out>, len=<optimized out>, from=<optimized out>) at ./src/dhcp.c:3530
#2  0x000055555557edc1 in dhcp_packet (ifp=0x5555555d2400, data=0x7ffffffedd80 "E\300\001H\177\022", len=328, bpf_flags=<optimized out>) at ./src/dhcp.c:3600
#3  0x0000555555597048 in ps_bpf_dispatch (ctx=ctx@entry=0x7fffffffdf20, psm=psm@entry=0x7ffffffedd40, msg=msg@entry=0x7ffffffedcf0) at ./src/privsep-bpf.c:308
#4  0x0000555555593bc9 in ps_root_dispatchcb (arg=arg@entry=0x7fffffffdf20, psm=psm@entry=0x7ffffffedd40, msg=msg@entry=0x7ffffffedcf0) at ./src/privsep-root.c:854
#5  0x0000555555592f8f in ps_recvpsmsg (ctx=<optimized out>, fd=<optimized out>, events=<optimized out>, callback=callback@entry=0x555555593bb0 <ps_root_dispatchcb>, cbctx=0x7fffffffdf20)
    at ./src/privsep.c:1156
#6  0x0000555555593a7c in ps_root_dispatch (arg=<optimized out>, events=<optimized out>) at ./src/privsep-root.c:867
#7  0x000055555556546b in eloop_run_ppoll (signals=0x7fffffffe168, ts=<optimized out>, eloop=0x5555555c8e00) at ./src/eloop.c:1106
#8  eloop_start (eloop=0x5555555c8e00, signals=signals@entry=0x7fffffffe168) at ./src/eloop.c:1228
#9  0x000055555555e486 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ./src/dhcpcd.c:2602
(gdb) 

This is with a bookworm backport of the 10.0.2-4 Debian package. With master this seems to be fixed.

I also did actually manage to workaround the instance issue using sudo unshare -m sh -c 'mount -t tmpfs /run /run; dhcpcd -B wlan1' but it turns out that running dhcpcd via ip vrf exec doesn't work anyway because netlink sockets aren't impacted by this so it's not much use to me after all.

Regardless, do you agree the documented behavior is correct? If so I'd be happy to send a patch.

rsmarples commented 1 year ago

The behavior of the code is correct (obviously not the segfaulting part!). For example you can start dhcpcd as a master process and deny all interfaces in the config. You can then tell the master process to start working on a specific interface like dhcpcd -n eth0. In this instance, dhcpcd will not start a new instance.

I will always accept patches to improve the documentation.

ipaton1 commented 1 year ago

indirectly related comment - it would be really useful to be able to configure the control socket location either via commandline options and/or from the config file instead of needing a recompile and a different binary. While @DanielG is talking about linux vrf, I've also run into the same/similar issue with linux network namespaces. The control socket sits outwith the network namespace, so trying to start multiple instances of dhcpcd in different network namespaces requires a load of additional hoops to be jumped through. ip netns exec <namespace> [...] will automatically bind mount files under /etc/netns/<namespace>/ back into /etc within the namespace so it's fairly simple to override the config file in a particular namespace with one specific to the namespace that could redefine the control socket location. Alternatively as if-linux.c already works out if you're running in a namespace it might be possible to simply prefix/suffix the control socket name with the namespace - that may be slightly less flexible but would have the same result for the case I'm thinking about. A non-compile time way to specify the control socket location is probably the most flexible though.

rsmarples commented 1 year ago

@ipaton1 for the namespace perspective, would we want a separate run directory entirely? We can do that - /var/run/dhcpcd-netns-$namesapce/ - that gives you different pidfile, control socket, etc. Can you open a new issue for that please rather than tail gating this one?