christgau / wsdd

A Web Service Discovery host daemon.
MIT License
861 stars 101 forks source link

Issue when used in a container #50

Closed jiplen closed 4 years ago

jiplen commented 4 years ago

I was able to work around this issue by changing:

self.socket.bind((os.getpid(), rtm_groups))

to

self.socket.bind((0, rtm_groups))

Based on this issue in an existing project (with the explanation there): https://github.com/zerotier/ZeroTierOne/issues/994

The commit that fixes this is here (the username changed, so it was a slight pain to get to):

https://github.com/zerotier/ZeroTierOne/commit/3dec78f50963d616ca3d469b2e55c0fbf30ec231

Thanks for the great library!

christgau commented 4 years ago

This is a little confusing. The netlink man page states:

nl_pid is the unicast address of netlink socket. It's always 0 if the destination is in the kernel. For a user-space process, nl_pid is usually the PID of the process owning the destination socket. However, nl_pid identifies a netlink socket, not a process. [...] If the application sets nl_pid before calling bind(2), then it is up to the application to make sure that nl_pid is unique. If the application sets it to 0, the kernel takes care of assigning it. The kernel assigns the process ID to the first netlink socket the process opens and assigns a unique nl_pid to every netlink socket that the process subsequently creates.

For wsdd, only a single netlink socket is created and the PID is used as nl_pid, which - according to the man page - is usually ok. My assumption was that using the PID is ok even when the PID is not unique accross namespace because they would appear in different namespaces and the kernel could differentiate between them. Apparently, a wrong assumption. The key point appear to be that _it is up to the application to make sure that nlpid is unique - on the whole system.

Long story short: the fix is trivial... Will apply it, asap. Thanks for reporting and the linked resources!