coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
260 stars 60 forks source link

Wireguard tunnel (via wg-quick) not starting any more since 38.20230414.3.0 #1487

Open chrismade opened 1 year ago

chrismade commented 1 year ago

maybe I'm wrong here to report an issue to CoreOS - then pls help me to find the right place luckily I'm user of CoreOS since years - and this is now the first time I have to report an issue

to repro the issue use any CoreOS per-38.20230414.3.0 e.g. tested without issue: Fedora CoreOS 37.20230322.3.0

config a tunnel in wireguard in /etc/wireguard/wg0.conf which connects to any other wireguard endpoint this is my config for reference

[Interface]
PrivateKey = skskskskskmyprivatekey=
Address=192.168.69.2/24
PostUp = ip route add 167.235.224.42/32 via 172.29.12.1
PostDown = ip route del 167.235.224.42/32 via 172.29.12.1

[Peer]
 PublicKey=qwqwqwmypublickey=
 Endpoint=111.222.224.42:51820
 AllowedIPs = 0.0.0.0/0 # Forward all traffic to server
 PersistentKeepalive=25

check with ip a or wg show or wg-quick up wg0 that the tunnel is up - as expected

then wait for the Zincati auto-update or enforce it to happen now - after update and reboot the tunnel is down and cannot be (re)started. check journalctl or try manually to start the tunnel - an error message shows:

[#] sysctl -q net.ipv4.conf.all.src_valid_mark=1
 sysctl: cannot stat /proc/sys/net/ipv4/conf/all/src_valid_mark: Permission denied

which indicates why the tunnel does not start any more - the issue resists after (any number of reboots)

nothing else was changed on the system - should be easy to reproduce

dustymabe commented 1 year ago

maybe I'm wrong here to report an issue to CoreOS - then pls help me to find the right place

For wireguard the issue could be in the kernel itself or in the wireguard tools package or somewhere else. Here's where bugs are filed for at least kernel and wireguard-tools:

(click on the Bug Reports link on those pages)

I know @jdoss uses wireguard and he's also the wireguard tools maintainer in Fedora so he may have seen this problem before.

quentin9696 commented 1 year ago

Hi,

I'm running the same issue.

Since Fedora 38, wg-quick is confined in its own context wireguard_exec_t. This cause an issue with PostUp/PreUp operations since you can invoke almost anything.

In my case, I have the same thing while trying to run systemd-creds.

Can you check in journalctl -b 0 what's the SE Linux error ?

In my case, I got AVC avc: denied { read } for pid=6985 comm="systemd-creds" name="WGPrivateKey" dev="tmpfs" ino=5161 scontext=system_u:system_r:wireguard_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=file permissive=0 and you should have something similar

I open a ticket on Fedora SE linux https://github.com/fedora-selinux/selinux-policy/issues/1675

chrismade commented 1 year ago

thanks @quentin9696 - my avc error message looks slightly different on my end

... audit[988]: AVC avc:  denied  { search } for  pid=988 comm="sysctl" name="net" dev="proc" ino=18950 scontext=system_u:system_r:wireguard_t:s0 tcontext=system_u:object_r:sysctl_net_t:s0 tclass=dir permissive=0
... audit: type=1400 audit(1683264149.716:85): avc:  denied  { search } for  pid=988 comm="sysctl" name="net" dev="proc" ino=18950 scontext=system_u:system_r:wireguard_t:s0 tcontext=system_u:object_r:sysctl_net_t:s0 tclass=dir permissive=0
... audit: type=1300 audit(1683264149.716:85): arch=c000003e syscall=262 success=no exit=-13 a0=ffffff9c a1=5609c64f4320 a2=7ffe20949c50 a3=0 items=0 ppid=910 pid=988 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="sysctl" exe="/usr/sbin/sysctl" subj=system_u:system_r:wireguard_t:s0 key=(null)
... audit[988]: SYSCALL arch=c000003e syscall=262 success=no exit=-13 a0=ffffff9c a1=5609c64f4320 a2=7ffe20949c50 a3=0 items=0 ppid=910 pid=988 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="sysctl" exe="/usr/sbin/sysctl" subj=system_u:system_r:wireguard_t:s0 key=(null)
... systemd[1]: Failed to start wg-quick@wg0.service - WireGuard via wg-quick(8) for wg0

but it seems that also my issue is in the same context of SELinux and wireguard - @dustymabe @jdoss which is why I'm afraid they will send me back right here if I report this into the kernel / wireguard (tools) channel - saying WG works just fine for the rest of the world - but not in Fedora38 hence they should fix it -

meanwhile I'll follow the rollback instructions here https://docs.fedoraproject.org/en-US/fedora-coreos/manual-rollbacks/ and observe and support troubleshooting here

dustymabe commented 1 year ago

xref: https://bugzilla.redhat.com/show_bug.cgi?id=2188714

travier commented 1 year ago

A temporary workaround for this issue is to mark the source domains (scontext) from the denied AVCs as permissive:

You can do that with:

# setenforce 1
# cat permissive-wireguard.cil
(typepermissive wireguard_t)
# semodule -i permissive-wireguard.cil

To remove it:

# semodule -r permissive-wireguard
gdonval commented 1 year ago

Are we expecting wireguard to work again soonish in Fedora or should we leverage the relative quietness of the next few months to work on switching distros?

chrismade commented 1 year ago

I would like to stay on coreOS ...

chrismade commented 1 year ago

just wanted to inform that the workaround as proposed by @travier did not work in my case

my config worked without any isues till CoreOS 37 (see above), since 38 I got this error message:

Jun 24 21:19:18 localhost.localdomain wg-quick[1443]: [#] ip -4 rule add not fwmark 51820 table 51820
Jun 24 21:19:18 localhost.localdomain wg-quick[1443]: [#] ip -4 rule add table main suppress_prefixlength 0
Jun 24 21:19:18 localhost.localdomain wg-quick[1443]: [#] sysctl -q net.ipv4.conf.all.src_valid_mark=1
Jun 24 21:19:18 localhost.localdomain wg-quick[1484]: sysctl: cannot stat /proc/sys/net/ipv4/conf/all/src_valid_mark: Permission denied
Jun 24 21:19:18 localhost.localdomain wg-quick[1443]: [#] ip -4 rule delete table 51820
Jun 24 21:19:18 localhost.localdomain wg-quick[1443]: [#] ip -4 rule delete table main suppress_prefixlength 0
Jun 24 21:19:18 localhost.localdomain wg-quick[1443]: [#] ip link delete dev youtubevpn
Jun 24 21:19:18 localhost.localdomain systemd[1]: wg-quick@youtubevpn.service: Main process exited, code=exited, status=1/FAILURE
Jun 24 21:19:18 localhost.localdomain systemd[1]: wg-quick@youtubevpn.service: Failed with result 'exit-code'.
Jun 24 21:19:18 localhost.localdomain systemd[1]: Failed to start wg-quick@youtubevpn.service - WireGuard via wg-quick(8) for youtubevpn.

applying the proposed workaround semodule -i permissive-wireguard.cil helped to go a little further but it is now throwing an error at another line and I still cannot establish a tunnel

Jun 24 21:22:52 localhost.localdomain wg-quick[883]: [#] ip -4 rule add table main suppress_prefixlength 0
Jun 24 21:22:52 localhost.localdomain wg-quick[883]: [#] sysctl -q net.ipv4.conf.all.src_valid_mark=1
Jun 24 21:22:52 localhost.localdomain wg-quick[883]: [#] nft -f /dev/fd/63
Jun 24 21:22:52 localhost.localdomain wg-quick[970]: internal:0:0-0: Error: Could not open file "/dev/fd/63": Permission denied
Jun 24 21:22:52 localhost.localdomain wg-quick[883]: [#] ip -4 rule delete table 51820
Jun 24 21:22:52 localhost.localdomain wg-quick[883]: [#] ip -4 rule delete table main suppress_prefixlength 0
Jun 24 21:22:52 localhost.localdomain wg-quick[883]: [#] ip link delete dev youtubevpn
Jun 24 21:22:53 localhost.localdomain systemd[1]: wg-quick@youtubevpn.service: Main process exited, code=exited, status=1/FAILURE
Jun 24 21:22:53 localhost.localdomain systemd[1]: wg-quick@youtubevpn.service: Failed with result 'exit-code'.
Jun 24 21:22:53 localhost.localdomain systemd[1]: Failed to start wg-quick@youtubevpn.service - WireGuard via wg-quick(8) for youtubevpn.

my current workaround is to disable the systemd service completely and start the tunnel manually from commandline:

[root@localhost ~]# wg-quick up youtubevpn
[#] ip link add youtubevpn type wireguard
[#] wg setconf youtubevpn /dev/fd/63
[#] ip -4 address add 192.168.69.2/24 dev youtubevpn
[#] ip link set mtu 1420 up dev youtubevpn
[#] wg set youtubevpn fwmark 51820
[#] ip -4 route add 0.0.0.0/0 dev youtubevpn table 51820
[#] ip -4 rule add not fwmark 51820 table 51820
[#] ip -4 rule add table main suppress_prefixlength 0
[#] sysctl -q net.ipv4.conf.all.src_valid_mark=1
[#] nft -f /dev/fd/63
[#] ip route add 5.15.25.196/32 via 172.xx.xx.1

maybe someone who is more familiar with SELinux has a better idea - and can help me to re-establish the tunnel without any manual interaction??

runiq commented 9 months ago

@chrismade The last comment in the Red Hat Bugzilla thread has instructions on how to help the devs debug this. You could try those instructions and see if they help the devs help you :)

chrismade commented 9 months ago

thanks @runiq - looks like you don't leave anybody behind ;-)

As no fast solution came on my reported issue ... did some more research on this issue and it looks like "by design" I found a redhat/fedora blog which said I have to use nmcli also for wireguard - instead of the pure wireguard config which is done on any other Linux system - and that fixed the problem.

shame on me I lost the note which article led me to the solution - which is the reason I never closed this issue because I wanted to link it for anyone else coming this route - IIRC it was this one: https://fedoramagazine.org/configure-wireguard-vpns-with-networkmanager/

travier commented 8 months ago

I personally use Wireguard on my servers via the NetworkManager setup and I can confirm it works. The wg-quick setup is however untested so adding a test for it would help us surface issues and avoid those regressions in the future.

szpak commented 4 months ago

I personally use Wireguard on my servers via the NetworkManager setup and I can confirm it works.

I have to check the Wireguard configuration on a server with NetworkManager, but for people willing to use it separately (as it was possible in Fedora <38), I put the selinux policies required to make it work with firewalld in that Bugzilla bug.

In short:

(allow wireguard_t cert_t (dir (search)))
(allow wireguard_t cert_t (file (read open getattr)))
(allow wireguard_t proc_t (file (read open)))
(allow wireguard_t sysfs_t (file (read open)))
(allow wireguard_t firewalld_t (dbus (send_msg)))
(allow firewalld_t wireguard_t (dbus (send_msg)))

Nevertheless, maybe configuration with NM is in fact less problematic :thinking: