Closed koitsu closed 4 years ago
Thanks for reporting that issue along with helpful information. From your logs it look like the HTTP request is not processed as intented. FYI: The discovery of a WSD device works in two phases: multicasting via UDP to find the devices on the network and a metadata exchange via TCP/HTTP on port 5357. The retrieval of the metadata is required for the host in order to appear in the Network view of windows.
In the tcpdump log, I see that a TCP connection on that port is established and two packets flow to wsdd with very roughly 1kB of payload in total. In contrast to that, I do not see any trace of that message being processed (or an error) in the daemon log nor does there appear to be a reply flowing back to the Windows host (192.168.1.50) in the tcpdump. It would be helpful to see what is inside the HTTP request. Would you mind to provide the pcap file or the extracted HTTP request.
You may also add some logging/print here https://github.com/christgau/wsdd/blob/5a9e6c080f48b53c7a7b9fc503b5d4f8729a9e0a/src/wsdd.py#L829 to check the content-length and what data is received (if any).
Remark: I observed similar but unreproducible behavior for FreeBSD, but the problem was with IPv6 only and it was gone in IPv4-only operations, so this does not apply here 🙁
First: yeah, I read past and existing GH reports of issues on FreeBSD pertaining to IPv6. This host has IPv6 disabled at all layers (i.e. interfaces and route/protocol prioritisation) except the kernel (it can't be disabled there else existing userland programs built with IPv6 support begin failing (nothing to do with wsdd)). I'm very explicit in starting daemons with -4
flags etc., and wsdd is no exception. :-) I always specify IPv4-only and interfaces if such programs allow me to.
Secondly: I've modified wsdd.py to include the following line after the one you referenced:
logger.debug('WSDHttpRequestHandler do_POST self.headers: {}'.format(self.headers))
Thirdly: happy to provide a pcap! Generated on the server (192.168.1.51) with tcpdump -vvv -p -i em0 -s 8192 -w capture.pcap "host 192.168.1.50 and not (port 22 or port 53 or icmp or arp)"
to ensure we get only (mostly) relevant traffic, including SSDP. I've looked at raw traffic with -X
but given that this is SOAP/XML, the necessary details tend to be hard for me to discern unless I know exactly what I'm looking for. If you need me to capture with promiscuous mode enabled, just let me know. Packet capture attached: capture.pcap.gz
System info and logs for this debug session are below. I should note that I did not see the debug log line I added to the code.
# ps -auxw | grep wsdd
daemon 18217 1.2 0.2 26256 19180 - S 02:01 0:00.19 /usr/local/bin/python3.7 /usr/local/bin/wsdd -w WORKGROUP -4 -i em0 -v -v -v
root 18216 0.0 0.0 6384 1868 - Ss 02:01 0:00.00 daemon: /usr/local/bin/wsdd[18217] (daemon)
root 18221 0.0 0.0 6652 2036 0 S+ 02:01 0:00.00 grep wsdd
# sockstat -l | grep daemon
daemon python3.7 18217 6 udp4 239.255.255.250:3702 *:*
daemon python3.7 18217 7 udp4 *:59216 *:*
daemon python3.7 18217 8 tcp4 192.168.1.51:5357 *:*
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,686:wsdd INFO(pid 18217): using pre-defined UUID 2c2c4f89-43b2-5a0d-b73a-e622360b0358
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd DEBUG(pid 18217): new address 192.168.1.51 on em0
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd DEBUG(pid 18217): handling traffic for 192.168.1.51 on em0
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd INFO(pid 18217): joined multicast group ('239.255.255.250', 3702) on 192.168.1.51%em0
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd DEBUG(pid 18217): transport address on em0 is 192.168.1.51
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd DEBUG(pid 18217): will listen for HTTP traffic on address ('192.168.1.51', 5357)
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,699:wsdd DEBUG(pid 18217): constructed xml for WSD message: b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:pnpx="http://schemas.microsoft.com/windows/pnpx/2005/10" xmlns:pub="http://schemas.microsoft.com/windows/pub/2005/07" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof" xmlns:wsx="http://schemas.xmlsoap.org/ws/2004/09/mex"><soap:Header><wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Hello</wsa:Action><wsa:MessageID>urn:uuid:43c9a165-b5f9-11ea-91da-003048d222d0</wsa:MessageID><wsd:AppSequence InstanceId="1592989277" MessageNumber="0" SequenceId="urn:uuid:43c9a318-b5f9-11ea-91da-003048d222d0" /></soap:Header><soap:Body><wsd:Hello><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference><wsd:XAddrs>http://192.168.1.51:5357/2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsd:XAddrs><wsd:MetadataVersion>1</wsd:MetadataVersion></wsd:Hello></soap:Body></soap:Envelope>'
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,700:wsdd DEBUG(pid 18217): scheduling Hello message via em0 to ('239.255.255.250', 3702)
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,260:wsdd INFO(pid 18217): 192.168.1.50:53778(em0) - - "Probe urn:uuid:e398cb42-0fa7-4f4b-8108-265b9f90078b UDP" - -
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,261:wsdd DEBUG(pid 18217): incoming message content is b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof"><soap:Header><wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Probe</wsa:Action><wsa:MessageID>urn:uuid:e398cb42-0fa7-4f4b-8108-265b9f90078b</wsa:MessageID></soap:Header><soap:Body><wsd:Probe><wsd:Types>wsdp:Device</wsd:Types></wsd:Probe></soap:Body></soap:Envelope>'
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,261:wsdd DEBUG(pid 18217): constructed xml for WSD message: b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:pnpx="http://schemas.microsoft.com/windows/pnpx/2005/10" xmlns:pub="http://schemas.microsoft.com/windows/pub/2005/07" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof" xmlns:wsx="http://schemas.xmlsoap.org/ws/2004/09/mex"><soap:Header><wsa:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/ProbeMatches</wsa:Action><wsa:MessageID>urn:uuid:631df65a-b5f9-11ea-91da-003048d222d0</wsa:MessageID><wsa:RelatesTo>urn:uuid:e398cb42-0fa7-4f4b-8108-265b9f90078b</wsa:RelatesTo><wsd:AppSequence InstanceId="1592989277" MessageNumber="1" SequenceId="urn:uuid:631df90b-b5f9-11ea-91da-003048d222d0" /></soap:Header><soap:Body><wsd:ProbeMatches><wsd:ProbeMatch><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference><wsd:Types>wsdp:Device pub:Computer</wsd:Types><wsd:MetadataVersion>1</wsd:MetadataVersion></wsd:ProbeMatch></wsd:ProbeMatches></soap:Body></soap:Envelope>'
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,308:wsdd INFO(pid 18217): 192.168.1.50:53780(em0) - - "Resolve urn:uuid:8fac4c01-f73d-467b-89db-e253b73687ac UDP" - -
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,309:wsdd DEBUG(pid 18217): incoming message content is b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery"><soap:Header><wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Resolve</wsa:Action><wsa:MessageID>urn:uuid:8fac4c01-f73d-467b-89db-e253b73687ac</wsa:MessageID></soap:Header><soap:Body><wsd:Resolve><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference></wsd:Resolve></soap:Body></soap:Envelope>'
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,309:wsdd DEBUG(pid 18217): constructed xml for WSD message: b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:pnpx="http://schemas.microsoft.com/windows/pnpx/2005/10" xmlns:pub="http://schemas.microsoft.com/windows/pub/2005/07" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof" xmlns:wsx="http://schemas.xmlsoap.org/ws/2004/09/mex"><soap:Header><wsa:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/ResolveMatches</wsa:Action><wsa:MessageID>urn:uuid:632549c4-b5f9-11ea-91da-003048d222d0</wsa:MessageID><wsa:RelatesTo>urn:uuid:8fac4c01-f73d-467b-89db-e253b73687ac</wsa:RelatesTo><wsd:AppSequence InstanceId="1592989277" MessageNumber="2" SequenceId="urn:uuid:63254c08-b5f9-11ea-91da-003048d222d0" /></soap:Header><soap:Body><wsd:ResolveMatches><wsd:ResolveMatch><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference><wsd:Types>wsdp:Device pub:Computer</wsd:Types><wsd:XAddrs>http://192.168.1.51:5357/2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsd:XAddrs><wsd:MetadataVersion>1</wsd:MetadataVersion></wsd:ResolveMatch></wsd:ResolveMatches></soap:Body></soap:Envelope>'
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,467:wsdd DEBUG(pid 18217): known message (urn:uuid:e398cb42-0fa7-4f4b-8108-265b9f90078b): dropping it
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,555:wsdd DEBUG(pid 18217): known message (urn:uuid:8fac4c01-f73d-467b-89db-e253b73687ac): dropping it
Finally: and yeah, the W7 workstation most definitely is using WORKGROUP
as its workgroup, and is not part of a domain. The whole reason I'm trying to get wsdd working is so that I can disable SMB1 protocol yet still see my FreeBSD (Samba/SMB/CIFS) box via "Network Neighbourhood".
Several things I tried to no avail (and restarting wsdd after every attempt):
sysctl net.link.ether.inet.allow_multicast
from 0
(default) to 1
-- I didn't expect this to do anything, but still...ip_mroute
kernel module -- didn't expect this to do anything either, since it's about multicast routingINADDR_ANY
(a.k.a. ''
) instead of WSD_MCAST_GRP_V4
for its UDP port 3702 listener. I tried this because of what I found here: https://svnweb.freebsd.org/ports/branches/2016Q1/net/minidlna/files/patch-minissdp.c?view=markup&pathrev=406207 (further details: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205430 and https://lists.freebsd.org/pipermail/freebsd-net/2016-January/044313.html)WSD_MCAST_GRP_V4
to ''
-- I figured this wouldn't work and I was right (Python runtime errors).Several other things while they're fresh in my mind:
ifmcstat
once wsdd is running (the 239.255.255.250 entry goes away if wsdd is shut down):em0:
inet 192.168.1.51
igmpv3 rv 2 qi 125 qri 10 uri 3
group 239.255.255.250 mode exclude
mcast-macaddr 01:00:5e:7f:ff:fa
group 224.0.0.1 mode exclude
mcast-macaddr 01:00:5e:00:00:01
Let me know what else I can try. I'm really suspecting some kind of Linux vs. FreeBSD difference that manifests itself this way, either in Python or wsdd. Hard to say. I do have a Linux box (on the same network segment) I could try this on, but I suspect it would work there and the issue is quite possibly FreeBSD-oriented.
There's been some success, in a very weird/interesting way. I came across this repository for FreeNAS, which appears to have an older version of wsdd (but the fact it's used by the FreeNAS folks, I think, indicates it should work). I see you're also a contributor to that repository: https://github.com/freenas/wsdd
I tried it out. Lo and behold, IT WORKS! Of course, the hostname that shows up in Network Neighbourhood is my machine's FQDN rather than the shortname (I think this was improved in later commits?), but it's actually functional.
The only other thing I see that's weird is this message, but it seems to be OK:
unhandled action http://schemas.xmlsoap.org/ws/2005/04/discovery/Hello/urn:uuid:26212bdf-b79e-11ea-91da-003048d222d0
So, in short: there is either a regression in present-day wsdd vs. the (older?) one in that repository, or there is a deeper bug that needs investigating. A diff -u ... | wc -l
returns 1928, which is a huge number of changes. But somewhere in there is the answer.
I think what I'll try doing is using the present-day version of wsdd (from this repository here) and begin slowly working backwards in commits (git reset --hard XXX
is useful for this) and see which commit introduced the breakage. I think that might help narrow it down, anyway!
Good news: I found the commit which introduced the problem. Only took me a few minutes. Commits in order I tested them, working from older to newer, then slowly backwards again to narrow it down:
2018/12/09 c0b8fd2bae7753799a70f3933eddc6e98fbd504f = works, but uses FQDN of host 2019/04/14 494e967bbeaf9897b52a1234a29b2fb0ec9ab32f = works, uses short hostname 2019/12/16 d1c0d3e31768646326a15d9eaef28ec9a3180023 = works, uses short hostname 2020/05/30 49813b3e97a5c673f9ff7fe3e5cefc1145c910ab = doesn't work 2020/05/30 dc32553a9f8875fa0fb22525fd6bcf0875555cee = fails (NameError: name 'xaddr' is not defined)
I also tried 523804e, but that fails (Listening to no IP address family), and then 4808a10 (which is newer, and doesn't fail, but also doesn't work).
So in short, it appears to be dc32553 which broke things on FreeBSD (at least when using IPv4 only like I am). Sadly that commit is extremely large. At least we've narrowed down part of the smoking gun.
Please let me know if you'd like me to try dc32553 but with some modifications for troubleshooting/debugging this problem further. (I just need to know what to modify and where, else provide a diff or point me to a unique branch I can try.)
Thanks a lot for your research. I've also done some investigations based on your feedback with the pcap file, but let me reply to your findings first.
I see you're also a contributor to that repository: https://github.com/freenas/wsdd
Thanks, but no.😉 I'm not a contributor to that repo. It's a fork of this one. IIRC, there no commit created by me other than those inherited from here. Nevertheless, Also, I'm aware that FreeNAS included wsdd.
I also tried 523804e, but that fails (Listening to no IP address family),
Yep, it fails because of a (stupid) bug that was fixed in 8249aefba4e51e57322ec700e0cf44bc8b9cdf77.
Good news: I found the commit which introduced the problem. [...] 2020/05/30 49813b3 = doesn't work
Well done. 49813b3 was a followup of dc32553a which actually was a lot of refactoring/reorganization of the code mainly for two reasons: First, the original code was quiet messy. The current one is better, but still... And second, there was a fundamental problem with the handling of network addresses changes and especially tentative IPv6 addresses which needed to be addressed. Changes on the fundamental network stuff was done in later commits (5c184f3 and 523804e).
and then 4808a10 (which is newer, and doesn't fail, but also doesn't work).
Changes there where minimal. No surprise, it didn't change anything.
Please let me know if you'd like me to try dc32553 but with some modifications for troubleshooting/debugging this problem further.
It sound like a good starting point, because there were also some changes on HTTP-related stuff which leads me to my findings.
From the pcap file, everything looks quite well. The multicast request from the Windows host and the replies from wsdd do match on the endpoint reference UUID that wsdd is using. This also confirms what we have seen in the log files (incoming and outgoing/constructed messages). What was not shown in the log file was the essential second step in the WSD lookup, i.e. the HTTP request. Nevertheless, Windows actually connects to the correct TCP server and sends out an HTTP request. From this, I conclude that there is no multicast problem. The request is also ACKed from the FreeBSD host, but apparently it does not arrive at wsdd. This is what confuses me a lot.
I've looked into the Python HTTP source code, but I haven't found something that could make the request to be eaten away at a first glance.
In dc32553, I've unified the class for IPv4 and IPv6 HTTP traffic into WSDHttpServer
: https://github.com/christgau/wsdd/commit/dc32553a9f8875fa0fb22525fd6bcf0875555cee#diff-e25d9307f21366373ed6e9eff0c66201L777. What confuses me a little is why I used type(self)
and not simply self
in the WSDHttpServer
class, but apart from that it does not look that wrong at all....
What you can try is the -t
option. This disables the HTTP server in wsdd such that you can use another program to handle HTTP traffic, like netcat e.g. I've checked it on Linux with nc -l -p 5357
and started with wsdd -t -vvv
after that. In netcat, I can see the HTTP request. I'd suggest you to try this as well to in order to verify that the data is delivered to userspace. That would eliminate any system setting (or firewall!?!?) as the reason. If netcat shows the HTTP request, we need to dig elsewhere.
The request is also ACKed from the FreeBSD host, but apparently it does not arrive at wsdd. This is what confuses me a lot.
The first thing that came to mind when I read this was the event system used for fd/socket communication between kernel and userland. In both d1c0d3e and dc32553, kqueue(2)
is used. kqueue on BSD is extremely reliable, but I didn't know Python had support for it until now.
I gave -t
and nc -l 5357
a try, on both of the above commits, and reproduced the same requests by the W7 client (though as stated in an update, I can reproduce this using a W10 client too). Netcat / nc -l 5357
sees the HTTP request (POST, HTTP client headers, blank line, and payload) in both versions. So kernel/userland event handoff is not the problem.
I'm a sysadmin, so the next thing I tried was using ktrace
around wsdd. FreeBSD has truss
and ktrace
(on Linux there is strace
which is awesome). truss
is unreliable on FreeBSD, but ktrace
works pretty well. I wanted to see if I could narrow down any differences in behaviour, file descriptor/socket I/O-wise.
The big difference was that dc32553 didn't show any sign of the POST
request from the client:
$ egrep -B1 'POST.*HTTP/1.1' ktrace_d1c0d3e.txt
76815 python3.7 GIO fd 8 read 864 bytes
"POST /2c2c4f89-43b2-5a0d-b73a-e622360b0358 HTTP/1.1\r
$ egrep -B1 'POST.*HTTP/1.1' ktrace_dc32553.txt
$
This just made me wonder what we did see, I/O-wise, from both versions. Rather than make this long comment even longer, I'll summarise: d1c0d3e saw the POST
+ HTTP server response. But dc32553 saw no such thing.
At this point, I had a deep suspicion that underlying socket behaviour might be the cause. I decided to look at what netstat
said for SendQ/RecvQ (to see if there was any pending I/O) to any of the 3 relevant sockets. FreeBSD doesn't provide a convenient -p
flag like on Linux, for matching against a PID, and sockstat
does not provide SendQ/RendQ information so it's an uphill battle.
And sure enough: with dc32553, the client establishes a TCP connection with wsdd, but then the RecvQ gets stuck (non-zero). After 30 seconds exactly, the socket goes into CLOSED
state (not TIME_WAIT or FIN_WAIT2). It is literally like the userland daemon is not aware that there is pending I/O on the file descriptor. That is where I think the problem lies, which circles back to my very first paragraph. :-) I can repeated this problem reliably 100% of the time, though sometimes the RecvQ number varies (864, 952, etc.):
$ sockstat -l | grep :5357
jdc python3.7 78040 8 tcp4 192.168.1.51:5357 *:*
$ netstat -an | egrep 'Proto|5357'
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 864 0 192.168.1.51.5357 192.168.1.50.50883 ESTABLISHED
tcp4 0 0 192.168.1.51.5357 *.* LISTEN
$ sleep 20 ; netstat -an | egrep 'Proto|5357'
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 864 0 192.168.1.51.5357 192.168.1.50.50883 CLOSED
tcp4 0 0 192.168.1.51.5357 *.* LISTEN
I re-ran ktrace
with -t+
to get as much syscall and lower-level detail as I could, because I was curious about what the state of the file descriptor was associated with the HTTP server on port 5357. As said, kqueue is involved, so I have to dig slowly and deep.
What I found is very strange. I've dumped the ktrace output using -E
; the 3rd column is the number of seconds since the program has started:
78614 python3.7 0.309700 CALL F64 socket(PF_INET,0x10000001<SOCK_STREAM|SOCK_CLOEXEC>,IPPROTO_IP)
78614 python3.7 0.309717 RET F64 socket 8
78614 python3.7 0.309742 CALL F64 setsockopt(0x8,SOL_SOCKET,SO_REUSEADDR,0x7fffffffc21c,0x4)
78614 python3.7 0.309754 RET F64 setsockopt 0
78614 python3.7 0.309766 CALL F64 bind(0x8,0x7fffffffc1f8,0x10)
78614 python3.7 0.309779 STRU F64 struct sockaddr { AF_INET, 192.168.1.51:5357 }
78614 python3.7 0.309790 RET F64 bind 0
78614 python3.7 0.309801 CALL F64 getsockname(0x8,0x7fffffffc1f8,0x7fffffffc1f4)
78614 python3.7 0.309812 STRU F64 struct sockaddr { AF_INET, 192.168.1.51:5357 }
78614 python3.7 0.309821 RET F64 getsockname 0
...
78614 python3.7 0.315925 CALL F64 kevent(0x4,0x8025f75b0,0x1,0,0,0x7fffffffc378)
78614 python3.7 0.315937 STRU F64 struct kevent[] = { { ident=8, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0, data=0, udata=0x0 } }
78614 python3.7 0.315949 STRU F64 struct kevent[] = { }
78614 python3.7 0.315958 RET F64 kevent 0
78614 python3.7 0.315979 CALL F64 listen(0x8,0x5)
78614 python3.7 0.315993 RET F64 listen 0
After this point, there is no other use of fd 8, other than its closure on program shutdown.
If we look at the kevent(2) man page we see that EVFILT_READ means ident is a file descriptor number. But the interesting part is this:
Sockets which have previously been passed to listen() return when there is an incoming connec- tion pending. data contains the size of the lis- ten backlog. Other socket descriptors return when there is data to be read, subject to the SO_RCVLOWAT value of the socket buffer. This may be overridden with a per-filter low water mark at the time the filter is added by setting the NOTE_LOWAT flag in fflags, and specifying the new low water mark in data. On return, data contains the number of bytes of protocol data available to read.
The order of syscalls pertaining to fd 8 is:
You can verify all of that here:
But d1c0d3e uses kqueue/kevent too, and it works htere. There, the listening fd is 7 (because newer wsdd has a raw route
socket on fd 4, so all the numbers shifted up by 1). Let's do the same analysis:
79231 python3.7 0.287239 CALL F64 socket(PF_INET,0x10000001<SOCK_STREAM|SOCK_CLOEXEC>,IPPROTO_IP)
79231 python3.7 0.287258 RET F64 socket 7
79231 python3.7 0.287283 CALL F64 setsockopt(0x7,SOL_SOCKET,SO_REUSEADDR,0x7fffffffd34c,0x4)
79231 python3.7 0.287295 RET F64 setsockopt 0
79231 python3.7 0.287307 CALL F64 bind(0x7,0x7fffffffd328,0x10)
79231 python3.7 0.287318 STRU F64 struct sockaddr { AF_INET, 192.168.1.51:5357 }
79231 python3.7 0.287330 RET F64 bind 0
79231 python3.7 0.287342 CALL F64 getsockname(0x7,0x7fffffffd328,0x7fffffffd324)
79231 python3.7 0.287355 STRU F64 struct sockaddr { AF_INET, 192.168.1.51:5357 }
79231 python3.7 0.287364 RET F64 getsockname 0
...
79231 python3.7 0.293796 CALL F64 listen(0x7,0x5)
79231 python3.7 0.293812 RET F64 listen 0
79231 python3.7 0.293845 CALL F64 kevent(0x4,0x80643f270,0x1,0,0,0x7fffffffd928)
79231 python3.7 0.293857 STRU F64 struct kevent[] = { { ident=7, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0, data=0, udata=0x0 } }
79231 python3.7 0.293869 STRU F64 struct kevent[] = { }
79231 python3.7 0.293882 RET F64 kevent 0
And here we see that listen()
is called before kevent()
on the listening fd, therefore things work..
Conclusion: it is wsdd doing this. There may be some "weird ordering nonsense" that Python is doing under the hood that you aren't familiar with that's probably buried deep in documentation.
Linux doesn't have kqueue/kevent -- it's equivalent of epoll(7)
. Both BSD and Linux have poll(2)
and select(2)
. You could try switching to poll or select if you wanted, but epoll and kqueue are significantly more performant (read: they use way less CPU): https://docs.python.org/3/library/select.html
Edit: you're using selectors
, so here's that doc too: https://docs.python.org/3/library/selectors.html -- note the example at the bottom and the order of the operations.
I assume you do not have a FreeBSD box to test this on. If you get a VM or something that runs it and verify this problem, this is what you'll need command-wise to help you out. You could verify this using curl -v http://x.x.x.x:5357/
-- it doesn't even matter that it's a GET /
, the problem is still the same.
ktrace -t+ -i -f ktrace_d1c0d3e.out ./wsdd_d1c0d3e.py -w WORKGROUP -4 -i em0 -v -v -v
<another window> curl -v http://x.x.x.x:5357/
kdump -t+ -r -E -A -f ktrace_d1c0d3e.out > ktrace_d1c0d3e.txt
vi ktrace_d1c0d3e.txt
...and spend a lot of time looking closely at what's happening. :-)
Really good catch!
Conclusion: it is wsdd doing this.
I may use the quote "you're god damn right". ;-)
And here we see that listen() is called before kevent() on the listening fd, therefore things work..
This really changed between the offending commits. In the old code, the HTTP server object was created an then registered to the selector (whatever Python thinks is the best one on the OS it runs on): https://github.com/christgau/wsdd/blob/d1c0d3e31768646326a15d9eaef28ec9a3180023/src/wsdd.py#L782-L783
Object creation on this case includes binding and "activation" of the server. Activation here means... a call to listen
.
However, in the refactored code, the register
function is called on the server_bind
: https://github.com/christgau/wsdd/blob/dc32553a9f8875fa0fb22525fd6bcf0875555cee/src/wsdd.py#L744-L749
This appears to cause the regression:
I assume you do not have a FreeBSD box to test this on.
I have a FreeBSD box here, but with recent 12.1. Strange enough that the most recent wsdd version works here even with the regression, but it also works with the fix I pushed into the issue-49 branch. So may give it (b7cf9a3) a try on your FreeBSD 11 box(es).
I tried branch issue-49 -- the problem is gone. Awesome!
Thx for your feedback and, btw, I really appreciate the way how the issue was tackled down.
I bundled the fix commit together with another one for a bugfix release (v0.6.1).
Server: FreeBSD 11.4 x64 Client: Windows 7 Professional SP1 x64 wsdd version: 0.6
Server: interface
em0
= 192.168.1.51 Client: 192.168.1.50Both systems are bare-metal, and on the same physical lan segment. No virtualisation is involved. While the machines have IPs in private space, NAT is only involved if/when packets go out the default gateway and through an unrelated router; NAT concerns therefore should not apply here.
On W7, I've verified -- and tried repeatedly restarting -- the "Function Discovery Resource Publication" (FDResPub) and "Function Discovery Provider Host" (fdPHost) services to no avail.
wsdd is started as:
/usr/local/bin/python3.7 /usr/local/bin/wsdd -w WORKGROUP -4 -i em0 -v -v -v
Let's see what ports/IPs it's listening on:
wsdd starts up and emits the following messages (please ignore the
daemon
part of the syslog message; this is a FreeBSD-ism and not your fault):Now on the Windows 7 client, I initiate a Shift-Refresh in the Network Neighbourhood window. We see the following in the wsdd log, but the server never appears on the client.
And that's it. I see no other logs from wsdd.
If we do a packet capture on the server via
tcpdump -p -i em0 -l -n "host 192.168.1.51"
, we see two-way packet communication, until the server sends the client some UDP payload and then nothing past that happens. Please note this capture was for a different request than the one shown above, but the behaviour/etc. is identical every time:This problem is 100% reproducible consistently, so troubleshooting should be possible.
Do you have any thoughts on what could be wrong?