WireGuard / wgctrl-go

Package wgctrl enables control of WireGuard interfaces on multiple platforms.
https://godoc.org/golang.zx2c4.com/wireguard/wgctrl
MIT License
753 stars 84 forks source link

"invalid argument" on Linux 5.2.1 #64

Closed apognu closed 5 years ago

apognu commented 5 years ago

Ever since I upgraded to Linux 5.1.2 (Arch Linux 5.1.2-arch1-1-ARCH), my tool that uses this library fails to set up Wireguard with a invalid argument error returned by the ConfigureDevice function. By my testing around, the peers configuration is what makes this error show up (if I remove all my code about peers, no issue appears, but of course, no peers are added to the Wireguard setup).

I checked by using the same version of my tool on another box running 5.0.5 and the issue does not appear. And as far as I can tell, my previous kernel version (5.1.4) did not have this issue either.

Using wg-quick instead of this library works properly on my current kernel.

I'm a bit at a loss as to how to debug this on my end and provide you with more information.

mdlayher commented 5 years ago

This sounds familiar. I think there was a matching WireGuard change to follow some new netlink convention, but I'm on vacation and traveling for the next week or so. I'd check the commit logs there, but otherwise I can take a look in a couple of weeks.

g00nix commented 5 years ago

I logged in on GitHub to open exactly this ticket. I can confirm the bug. The same code is working fine with older kernels. This problem is isolated only to wgctrl-go (not affecting wireguard).

I think there was a matching WireGuard change to follow some new netlink convention, but I'm on vacation and traveling for the next week or so. I'd check the commit logs there, but otherwise I can take a look in a couple of weeks.

I'll try to have a look if I have time. Enjoy your vacation. :smile_cat:

devinrsmith commented 5 years ago

This might be the cause of an issue I found here: https://github.com/costela/wesher/issues/5.

I'm on Linux arch1 5.2.3-arch1-1-ARCH.

I've got an nlmon capture if it's helpful for debugging purposes.

mdlayher commented 5 years ago

@devinrsmith that would be helpful, thanks.

devinrsmith commented 5 years ago

@mdlayher -> nlmon0.cap

mdlayher commented 5 years ago

Unfortunately I have some other obligations for the next few weeks and probably won't get to this for a bit, since I'm not running 5.2+ kernels on my machines. Would anybody like to take a crack at this? There's probably some change in the genetlink subsystem that requires things to be more specific.

mdlayher commented 5 years ago

I don't suppose @DMarby or other Mullvad folks have run into this? Sorry, I'm winding down travels but I still have low bandwidth at the moment.

DMarby commented 5 years ago

Haven't seen this so far, since we don't run 5.2+ kernels yet either. Will keep an eye out and see if I can allocate some time to look at it, but can't promise anything

mdlayher commented 5 years ago

I'm looking at commit https://git.zx2c4.com/WireGuard/commit/src/netlink.c?id=3120425f69003be287cb2d308f89c7a6a0335ff0 and I suspect something here is the cause. I haven't isolated the problem yet though.

mdlayher commented 5 years ago

Thanks to some investigative work from "ius" on IRC, I was shown this: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae0be8de9a53cda3505865c11826d8ff0640237c

It appears that NLA_F_NESTED is the key on newer kernels! I have a patch locally that seems to work, but I want to make sure I can get everything working on both old and new kernels. I should have something ready soon.