christgau / wsdd

A Web Service Discovery host daemon.
MIT License
855 stars 100 forks source link

wsdd not behaving with Windows 7 / Windows 10 clients + FreeBSD 11.x server #49

Closed koitsu closed 4 years ago

koitsu commented 4 years ago

Server: FreeBSD 11.4 x64 Client: Windows 7 Professional SP1 x64 wsdd version: 0.6

Server: interface em0 = 192.168.1.51 Client: 192.168.1.50

Both systems are bare-metal, and on the same physical lan segment. No virtualisation is involved. While the machines have IPs in private space, NAT is only involved if/when packets go out the default gateway and through an unrelated router; NAT concerns therefore should not apply here.

On W7, I've verified -- and tried repeatedly restarting -- the "Function Discovery Resource Publication" (FDResPub) and "Function Discovery Provider Host" (fdPHost) services to no avail.

wsdd is started as: /usr/local/bin/python3.7 /usr/local/bin/wsdd -w WORKGROUP -4 -i em0 -v -v -v

Let's see what ports/IPs it's listening on:

daemon   python3.7  86702 6  udp4   239.255.255.250:3702  *:*
daemon   python3.7  86702 7  udp4   *:16702               *:*
daemon   python3.7  86702 8  tcp4   192.168.1.51:5357     *:*

wsdd starts up and emits the following messages (please ignore the daemon part of the syslog message; this is a FreeBSD-ism and not your fault):

Jun 22 20:22:26 icarus daemon[86701]: 2020-06-22 20:22:26,550:wsdd DEBUG(pid 86702): new address 192.168.1.51 on em0
Jun 22 20:22:26 icarus daemon[86701]: 2020-06-22 20:22:26,551:wsdd DEBUG(pid 86702): handling traffic for 192.168.1.51 on em0
Jun 22 20:22:26 icarus daemon[86701]: 2020-06-22 20:22:26,551:wsdd INFO(pid 86702): joined multicast group ('239.255.255.250', 3702) on 192.168.1.51%em0
Jun 22 20:22:26 icarus daemon[86701]: 2020-06-22 20:22:26,551:wsdd DEBUG(pid 86702): transport address on em0 is 192.168.1.51
Jun 22 20:22:26 icarus daemon[86701]: 2020-06-22 20:22:26,551:wsdd DEBUG(pid 86702): will listen for HTTP traffic on address ('192.168.1.51', 5357)
Jun 22 20:22:26 icarus daemon[86701]: 2020-06-22 20:22:26,552:wsdd DEBUG(pid 86702): constructed xml for WSD message: b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:pnpx="http://schemas.microsoft.com/windows/pnpx/2005/10" xmlns:pub="http://schemas.microsoft.com/windows/pub/2005/07" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof" xmlns:wsx="http://schemas.xmlsoap.org/ws/2004/09/mex"><soap:Header><wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Hello</wsa:Action><wsa:MessageID>urn:uuid:c3110579-b500-11ea-91da-003048d222d0</wsa:MessageID><wsd:AppSequence InstanceId="1592882546" MessageNumber="0" SequenceId="urn:uuid:c3110785-b500-11ea-91da-003048d222d0" /></soap:Header><soap:Body><wsd:Hello><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference><wsd:XAddrs>http://192.168.1.51:5357/2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsd:XAddrs><wsd:MetadataVersion>1</wsd:MetadataVersion></wsd:Hello></soap:Body></soap:Envelope>'
Jun 22 20:22:26 icarus daemon[86701]: 2020-06-22 20:22:26,553:wsdd DEBUG(pid 86702): scheduling Hello message via em0 to ('239.255.255.250', 3702)

Now on the Windows 7 client, I initiate a Shift-Refresh in the Network Neighbourhood window. We see the following in the wsdd log, but the server never appears on the client.

Jun 22 20:22:26 icarus daemon[86701]: 2020-06-22 20:22:26,553:wsdd DEBUG(pid 86702): scheduling Hello message via em0 to ('239.255.255.250', 3702)
Jun 22 20:22:58 icarus daemon[86701]: 2020-06-22 20:22:58,629:wsdd INFO(pid 86702): 192.168.1.50:61316(em0) - - "Probe urn:uuid:8228b1c5-e4e0-48f8-9b6b-418986a1ad0c UDP" - -
Jun 22 20:22:58 icarus daemon[86701]: 2020-06-22 20:22:58,629:wsdd DEBUG(pid 86702): incoming message content is b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof"><soap:Header><wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Probe</wsa:Action><wsa:MessageID>urn:uuid:8228b1c5-e4e0-48f8-9b6b-418986a1ad0c</wsa:MessageID></soap:Header><soap:Body><wsd:Probe><wsd:Types>wsdp:Device</wsd:Types></wsd:Probe></soap:Body></soap:Envelope>'
Jun 22 20:22:58 icarus daemon[86701]: 2020-06-22 20:22:58,630:wsdd DEBUG(pid 86702): constructed xml for WSD message: b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:pnpx="http://schemas.microsoft.com/windows/pnpx/2005/10" xmlns:pub="http://schemas.microsoft.com/windows/pub/2005/07" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof" xmlns:wsx="http://schemas.xmlsoap.org/ws/2004/09/mex"><soap:Header><wsa:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/ProbeMatches</wsa:Action><wsa:MessageID>urn:uuid:d62fa52e-b500-11ea-91da-003048d222d0</wsa:MessageID><wsa:RelatesTo>urn:uuid:8228b1c5-e4e0-48f8-9b6b-418986a1ad0c</wsa:RelatesTo><wsd:AppSequence InstanceId="1592882546" MessageNumber="1" SequenceId="urn:uuid:d62fa7fd-b500-11ea-91da-003048d222d0" /></soap:Header><soap:Body><wsd:ProbeMatches><wsd:ProbeMatch><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference><wsd:Types>wsdp:Device pub:Computer</wsd:Types><wsd:MetadataVersion>1</wsd:MetadataVersion></wsd:ProbeMatch></wsd:ProbeMatches></soap:Body></soap:Envelope>'
Jun 22 20:22:58 icarus daemon[86701]: 2020-06-22 20:22:58,676:wsdd INFO(pid 86702): 192.168.1.50:61318(em0) - - "Resolve urn:uuid:4a35450f-4543-43e0-b442-65cfcfa84a42 UDP" - -
Jun 22 20:22:58 icarus daemon[86701]: 2020-06-22 20:22:58,676:wsdd DEBUG(pid 86702): incoming message content is b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery"><soap:Header><wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Resolve</wsa:Action><wsa:MessageID>urn:uuid:4a35450f-4543-43e0-b442-65cfcfa84a42</wsa:MessageID></soap:Header><soap:Body><wsd:Resolve><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference></wsd:Resolve></soap:Body></soap:Envelope>'
Jun 22 20:22:58 icarus daemon[86701]: 2020-06-22 20:22:58,677:wsdd DEBUG(pid 86702): constructed xml for WSD message: b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:pnpx="http://schemas.microsoft.com/windows/pnpx/2005/10" xmlns:pub="http://schemas.microsoft.com/windows/pub/2005/07" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof" xmlns:wsx="http://schemas.xmlsoap.org/ws/2004/09/mex"><soap:Header><wsa:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/ResolveMatches</wsa:Action><wsa:MessageID>urn:uuid:d636cdda-b500-11ea-91da-003048d222d0</wsa:MessageID><wsa:RelatesTo>urn:uuid:4a35450f-4543-43e0-b442-65cfcfa84a42</wsa:RelatesTo><wsd:AppSequence InstanceId="1592882546" MessageNumber="2" SequenceId="urn:uuid:d636d01b-b500-11ea-91da-003048d222d0" /></soap:Header><soap:Body><wsd:ResolveMatches><wsd:ResolveMatch><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference><wsd:Types>wsdp:Device pub:Computer</wsd:Types><wsd:XAddrs>http://192.168.1.51:5357/2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsd:XAddrs><wsd:MetadataVersion>1</wsd:MetadataVersion></wsd:ResolveMatch></wsd:ResolveMatches></soap:Body></soap:Envelope>'
Jun 22 20:22:58 icarus daemon[86701]: 2020-06-22 20:22:58,809:wsdd DEBUG(pid 86702): known message (urn:uuid:4a35450f-4543-43e0-b442-65cfcfa84a42): dropping it
Jun 22 20:22:58 icarus daemon[86701]: 2020-06-22 20:22:58,867:wsdd DEBUG(pid 86702): known message (urn:uuid:8228b1c5-e4e0-48f8-9b6b-418986a1ad0c): dropping it

And that's it. I see no other logs from wsdd.

If we do a packet capture on the server via tcpdump -p -i em0 -l -n "host 192.168.1.51", we see two-way packet communication, until the server sends the client some UDP payload and then nothing past that happens. Please note this capture was for a different request than the one shown above, but the behaviour/etc. is identical every time:

20:35:40.574272 IP 192.168.1.51.16702 > 192.168.1.50.50581: UDP, length 1247
20:35:40.619179 IP 192.168.1.51.16702 > 192.168.1.50.50583: UDP, length 1343
20:35:40.620956 IP 192.168.1.50.53384 > 192.168.1.51.5357: Flags [S], seq 631281493, win 8192, options [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
20:35:40.620976 IP 192.168.1.51.5357 > 192.168.1.50.53384: Flags [S.], seq 1520944053, ack 631281494, win 65535, options [mss 1460,nop,wscale 7,sackOK,eol], length 0
20:35:40.621080 IP 192.168.1.50.53384 > 192.168.1.51.5357: Flags [.], ack 1, win 16425, length 0
20:35:40.621209 IP 192.168.1.50.53384 > 192.168.1.51.5357: Flags [P.], seq 1:220, ack 1, win 16425, length 219
20:35:40.621330 IP 192.168.1.50.53384 > 192.168.1.51.5357: Flags [P.], seq 220:953, ack 1, win 16425, length 733
20:35:40.621339 IP 192.168.1.51.5357 > 192.168.1.50.53384: Flags [.], ack 953, win 505, length 0
20:35:40.721580 IP 192.168.1.51.16702 > 192.168.1.50.50581: UDP, length 1247
20:35:40.851109 IP 192.168.1.51.16702 > 192.168.1.50.50583: UDP, length 1343

This problem is 100% reproducible consistently, so troubleshooting should be possible.

Do you have any thoughts on what could be wrong?

christgau commented 4 years ago

Thanks for reporting that issue along with helpful information. From your logs it look like the HTTP request is not processed as intented. FYI: The discovery of a WSD device works in two phases: multicasting via UDP to find the devices on the network and a metadata exchange via TCP/HTTP on port 5357. The retrieval of the metadata is required for the host in order to appear in the Network view of windows.

In the tcpdump log, I see that a TCP connection on that port is established and two packets flow to wsdd with very roughly 1kB of payload in total. In contrast to that, I do not see any trace of that message being processed (or an error) in the daemon log nor does there appear to be a reply flowing back to the Windows host (192.168.1.50) in the tcpdump. It would be helpful to see what is inside the HTTP request. Would you mind to provide the pcap file or the extracted HTTP request.

You may also add some logging/print here https://github.com/christgau/wsdd/blob/5a9e6c080f48b53c7a7b9fc503b5d4f8729a9e0a/src/wsdd.py#L829 to check the content-length and what data is received (if any).

Remark: I observed similar but unreproducible behavior for FreeBSD, but the problem was with IPv6 only and it was gone in IPv4-only operations, so this does not apply here 🙁

koitsu commented 4 years ago

First: yeah, I read past and existing GH reports of issues on FreeBSD pertaining to IPv6. This host has IPv6 disabled at all layers (i.e. interfaces and route/protocol prioritisation) except the kernel (it can't be disabled there else existing userland programs built with IPv6 support begin failing (nothing to do with wsdd)). I'm very explicit in starting daemons with -4 flags etc., and wsdd is no exception. :-) I always specify IPv4-only and interfaces if such programs allow me to.

Secondly: I've modified wsdd.py to include the following line after the one you referenced:

logger.debug('WSDHttpRequestHandler do_POST self.headers: {}'.format(self.headers))

Thirdly: happy to provide a pcap! Generated on the server (192.168.1.51) with tcpdump -vvv -p -i em0 -s 8192 -w capture.pcap "host 192.168.1.50 and not (port 22 or port 53 or icmp or arp)" to ensure we get only (mostly) relevant traffic, including SSDP. I've looked at raw traffic with -X but given that this is SOAP/XML, the necessary details tend to be hard for me to discern unless I know exactly what I'm looking for. If you need me to capture with promiscuous mode enabled, just let me know. Packet capture attached: capture.pcap.gz

System info and logs for this debug session are below. I should note that I did not see the debug log line I added to the code.

# ps -auxw | grep wsdd
daemon  18217   1.2  0.2  26256  19180  -  S    02:01       0:00.19 /usr/local/bin/python3.7 /usr/local/bin/wsdd -w WORKGROUP -4 -i em0 -v -v -v
root    18216   0.0  0.0   6384   1868  -  Ss   02:01       0:00.00 daemon: /usr/local/bin/wsdd[18217] (daemon)
root    18221   0.0  0.0   6652   2036  0  S+   02:01       0:00.00 grep wsdd

# sockstat -l | grep daemon
daemon   python3.7  18217 6  udp4   239.255.255.250:3702  *:*
daemon   python3.7  18217 7  udp4   *:59216               *:*
daemon   python3.7  18217 8  tcp4   192.168.1.51:5357     *:*
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,686:wsdd INFO(pid 18217): using pre-defined UUID 2c2c4f89-43b2-5a0d-b73a-e622360b0358
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd DEBUG(pid 18217): new address 192.168.1.51 on em0
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd DEBUG(pid 18217): handling traffic for 192.168.1.51 on em0
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd INFO(pid 18217): joined multicast group ('239.255.255.250', 3702) on 192.168.1.51%em0
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd DEBUG(pid 18217): transport address on em0 is 192.168.1.51
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,698:wsdd DEBUG(pid 18217): will listen for HTTP traffic on address ('192.168.1.51', 5357)
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,699:wsdd DEBUG(pid 18217): constructed xml for WSD message: b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:pnpx="http://schemas.microsoft.com/windows/pnpx/2005/10" xmlns:pub="http://schemas.microsoft.com/windows/pub/2005/07" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof" xmlns:wsx="http://schemas.xmlsoap.org/ws/2004/09/mex"><soap:Header><wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Hello</wsa:Action><wsa:MessageID>urn:uuid:43c9a165-b5f9-11ea-91da-003048d222d0</wsa:MessageID><wsd:AppSequence InstanceId="1592989277" MessageNumber="0" SequenceId="urn:uuid:43c9a318-b5f9-11ea-91da-003048d222d0" /></soap:Header><soap:Body><wsd:Hello><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference><wsd:XAddrs>http://192.168.1.51:5357/2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsd:XAddrs><wsd:MetadataVersion>1</wsd:MetadataVersion></wsd:Hello></soap:Body></soap:Envelope>'
Jun 24 02:01:17 icarus wsdd[18216]: 2020-06-24 02:01:17,700:wsdd DEBUG(pid 18217): scheduling Hello message via em0 to ('239.255.255.250', 3702)
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,260:wsdd INFO(pid 18217): 192.168.1.50:53778(em0) - - "Probe urn:uuid:e398cb42-0fa7-4f4b-8108-265b9f90078b UDP" - -
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,261:wsdd DEBUG(pid 18217): incoming message content is b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof"><soap:Header><wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Probe</wsa:Action><wsa:MessageID>urn:uuid:e398cb42-0fa7-4f4b-8108-265b9f90078b</wsa:MessageID></soap:Header><soap:Body><wsd:Probe><wsd:Types>wsdp:Device</wsd:Types></wsd:Probe></soap:Body></soap:Envelope>'
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,261:wsdd DEBUG(pid 18217): constructed xml for WSD message: b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:pnpx="http://schemas.microsoft.com/windows/pnpx/2005/10" xmlns:pub="http://schemas.microsoft.com/windows/pub/2005/07" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof" xmlns:wsx="http://schemas.xmlsoap.org/ws/2004/09/mex"><soap:Header><wsa:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/ProbeMatches</wsa:Action><wsa:MessageID>urn:uuid:631df65a-b5f9-11ea-91da-003048d222d0</wsa:MessageID><wsa:RelatesTo>urn:uuid:e398cb42-0fa7-4f4b-8108-265b9f90078b</wsa:RelatesTo><wsd:AppSequence InstanceId="1592989277" MessageNumber="1" SequenceId="urn:uuid:631df90b-b5f9-11ea-91da-003048d222d0" /></soap:Header><soap:Body><wsd:ProbeMatches><wsd:ProbeMatch><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference><wsd:Types>wsdp:Device pub:Computer</wsd:Types><wsd:MetadataVersion>1</wsd:MetadataVersion></wsd:ProbeMatch></wsd:ProbeMatches></soap:Body></soap:Envelope>'
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,308:wsdd INFO(pid 18217): 192.168.1.50:53780(em0) - - "Resolve urn:uuid:8fac4c01-f73d-467b-89db-e253b73687ac UDP" - -
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,309:wsdd DEBUG(pid 18217): incoming message content is b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery"><soap:Header><wsa:To>urn:schemas-xmlsoap-org:ws:2005:04:discovery</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/Resolve</wsa:Action><wsa:MessageID>urn:uuid:8fac4c01-f73d-467b-89db-e253b73687ac</wsa:MessageID></soap:Header><soap:Body><wsd:Resolve><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference></wsd:Resolve></soap:Body></soap:Envelope>'
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,309:wsdd DEBUG(pid 18217): constructed xml for WSD message: b'<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:pnpx="http://schemas.microsoft.com/windows/pnpx/2005/10" xmlns:pub="http://schemas.microsoft.com/windows/pub/2005/07" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsd="http://schemas.xmlsoap.org/ws/2005/04/discovery" xmlns:wsdp="http://schemas.xmlsoap.org/ws/2006/02/devprof" xmlns:wsx="http://schemas.xmlsoap.org/ws/2004/09/mex"><soap:Header><wsa:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To><wsa:Action>http://schemas.xmlsoap.org/ws/2005/04/discovery/ResolveMatches</wsa:Action><wsa:MessageID>urn:uuid:632549c4-b5f9-11ea-91da-003048d222d0</wsa:MessageID><wsa:RelatesTo>urn:uuid:8fac4c01-f73d-467b-89db-e253b73687ac</wsa:RelatesTo><wsd:AppSequence InstanceId="1592989277" MessageNumber="2" SequenceId="urn:uuid:63254c08-b5f9-11ea-91da-003048d222d0" /></soap:Header><soap:Body><wsd:ResolveMatches><wsd:ResolveMatch><wsa:EndpointReference><wsa:Address>urn:uuid:2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsa:Address></wsa:EndpointReference><wsd:Types>wsdp:Device pub:Computer</wsd:Types><wsd:XAddrs>http://192.168.1.51:5357/2c2c4f89-43b2-5a0d-b73a-e622360b0358</wsd:XAddrs><wsd:MetadataVersion>1</wsd:MetadataVersion></wsd:ResolveMatch></wsd:ResolveMatches></soap:Body></soap:Envelope>'
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,467:wsdd DEBUG(pid 18217): known message (urn:uuid:e398cb42-0fa7-4f4b-8108-265b9f90078b): dropping it
Jun 24 02:02:10 icarus wsdd[18216]: 2020-06-24 02:02:10,555:wsdd DEBUG(pid 18217): known message (urn:uuid:8fac4c01-f73d-467b-89db-e253b73687ac): dropping it

Finally: and yeah, the W7 workstation most definitely is using WORKGROUP as its workgroup, and is not part of a domain. The whole reason I'm trying to get wsdd working is so that I can disable SMB1 protocol yet still see my FreeBSD (Samba/SMB/CIFS) box via "Network Neighbourhood".

koitsu commented 4 years ago

Several things I tried to no avail (and restarting wsdd after every attempt):

  1. Changing sysctl net.link.ether.inet.allow_multicast from 0 (default) to 1 -- I didn't expect this to do anything, but still...
  2. Loading ip_mroute kernel module -- didn't expect this to do anything either, since it's about multicast routing
  3. Modifying wsdd to bind to INADDR_ANY (a.k.a. '') instead of WSD_MCAST_GRP_V4 for its UDP port 3702 listener. I tried this because of what I found here: https://svnweb.freebsd.org/ports/branches/2016Q1/net/minidlna/files/patch-minissdp.c?view=markup&pathrev=406207 (further details: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205430 and https://lists.freebsd.org/pipermail/freebsd-net/2016-January/044313.html)
  4. Changing WSD_MCAST_GRP_V4 to '' -- I figured this wouldn't work and I was right (Python runtime errors).

Several other things while they're fresh in my mind:

em0:
        inet 192.168.1.51
        igmpv3 rv 2 qi 125 qri 10 uri 3
                group 239.255.255.250 mode exclude
                        mcast-macaddr 01:00:5e:7f:ff:fa
                group 224.0.0.1 mode exclude
                        mcast-macaddr 01:00:5e:00:00:01

Let me know what else I can try. I'm really suspecting some kind of Linux vs. FreeBSD difference that manifests itself this way, either in Python or wsdd. Hard to say. I do have a Linux box (on the same network segment) I could try this on, but I suspect it would work there and the issue is quite possibly FreeBSD-oriented.

koitsu commented 4 years ago

There's been some success, in a very weird/interesting way. I came across this repository for FreeNAS, which appears to have an older version of wsdd (but the fact it's used by the FreeNAS folks, I think, indicates it should work). I see you're also a contributor to that repository: https://github.com/freenas/wsdd

I tried it out. Lo and behold, IT WORKS! Of course, the hostname that shows up in Network Neighbourhood is my machine's FQDN rather than the shortname (I think this was improved in later commits?), but it's actually functional.

The only other thing I see that's weird is this message, but it seems to be OK:

unhandled action http://schemas.xmlsoap.org/ws/2005/04/discovery/Hello/urn:uuid:26212bdf-b79e-11ea-91da-003048d222d0

So, in short: there is either a regression in present-day wsdd vs. the (older?) one in that repository, or there is a deeper bug that needs investigating. A diff -u ... | wc -l returns 1928, which is a huge number of changes. But somewhere in there is the answer.

I think what I'll try doing is using the present-day version of wsdd (from this repository here) and begin slowly working backwards in commits (git reset --hard XXX is useful for this) and see which commit introduced the breakage. I think that might help narrow it down, anyway!

koitsu commented 4 years ago

Good news: I found the commit which introduced the problem. Only took me a few minutes. Commits in order I tested them, working from older to newer, then slowly backwards again to narrow it down:

2018/12/09 c0b8fd2bae7753799a70f3933eddc6e98fbd504f = works, but uses FQDN of host 2019/04/14 494e967bbeaf9897b52a1234a29b2fb0ec9ab32f = works, uses short hostname 2019/12/16 d1c0d3e31768646326a15d9eaef28ec9a3180023 = works, uses short hostname 2020/05/30 49813b3e97a5c673f9ff7fe3e5cefc1145c910ab = doesn't work 2020/05/30 dc32553a9f8875fa0fb22525fd6bcf0875555cee = fails (NameError: name 'xaddr' is not defined)

I also tried 523804e, but that fails (Listening to no IP address family), and then 4808a10 (which is newer, and doesn't fail, but also doesn't work).

So in short, it appears to be dc32553 which broke things on FreeBSD (at least when using IPv4 only like I am). Sadly that commit is extremely large. At least we've narrowed down part of the smoking gun.

Please let me know if you'd like me to try dc32553 but with some modifications for troubleshooting/debugging this problem further. (I just need to know what to modify and where, else provide a diff or point me to a unique branch I can try.)

christgau commented 4 years ago

Thanks a lot for your research. I've also done some investigations based on your feedback with the pcap file, but let me reply to your findings first.

I see you're also a contributor to that repository: https://github.com/freenas/wsdd

Thanks, but no.😉 I'm not a contributor to that repo. It's a fork of this one. IIRC, there no commit created by me other than those inherited from here. Nevertheless, Also, I'm aware that FreeNAS included wsdd.

I also tried 523804e, but that fails (Listening to no IP address family),

Yep, it fails because of a (stupid) bug that was fixed in 8249aefba4e51e57322ec700e0cf44bc8b9cdf77.

Good news: I found the commit which introduced the problem. [...] 2020/05/30 49813b3 = doesn't work

Well done. 49813b3 was a followup of dc32553a which actually was a lot of refactoring/reorganization of the code mainly for two reasons: First, the original code was quiet messy. The current one is better, but still... And second, there was a fundamental problem with the handling of network addresses changes and especially tentative IPv6 addresses which needed to be addressed. Changes on the fundamental network stuff was done in later commits (5c184f3 and 523804e).

and then 4808a10 (which is newer, and doesn't fail, but also doesn't work).

Changes there where minimal. No surprise, it didn't change anything.

Please let me know if you'd like me to try dc32553 but with some modifications for troubleshooting/debugging this problem further.

It sound like a good starting point, because there were also some changes on HTTP-related stuff which leads me to my findings.

From the pcap file, everything looks quite well. The multicast request from the Windows host and the replies from wsdd do match on the endpoint reference UUID that wsdd is using. This also confirms what we have seen in the log files (incoming and outgoing/constructed messages). What was not shown in the log file was the essential second step in the WSD lookup, i.e. the HTTP request. Nevertheless, Windows actually connects to the correct TCP server and sends out an HTTP request. From this, I conclude that there is no multicast problem. The request is also ACKed from the FreeBSD host, but apparently it does not arrive at wsdd. This is what confuses me a lot.

I've looked into the Python HTTP source code, but I haven't found something that could make the request to be eaten away at a first glance.

In dc32553, I've unified the class for IPv4 and IPv6 HTTP traffic into WSDHttpServer: https://github.com/christgau/wsdd/commit/dc32553a9f8875fa0fb22525fd6bcf0875555cee#diff-e25d9307f21366373ed6e9eff0c66201L777. What confuses me a little is why I used type(self) and not simply self in the WSDHttpServer class, but apart from that it does not look that wrong at all....

What you can try is the -t option. This disables the HTTP server in wsdd such that you can use another program to handle HTTP traffic, like netcat e.g. I've checked it on Linux with nc -l -p 5357 and started with wsdd -t -vvv after that. In netcat, I can see the HTTP request. I'd suggest you to try this as well to in order to verify that the data is delivered to userspace. That would eliminate any system setting (or firewall!?!?) as the reason. If netcat shows the HTTP request, we need to dig elsewhere.

koitsu commented 4 years ago

The request is also ACKed from the FreeBSD host, but apparently it does not arrive at wsdd. This is what confuses me a lot.

The first thing that came to mind when I read this was the event system used for fd/socket communication between kernel and userland. In both d1c0d3e and dc32553, kqueue(2) is used. kqueue on BSD is extremely reliable, but I didn't know Python had support for it until now.

I gave -t and nc -l 5357 a try, on both of the above commits, and reproduced the same requests by the W7 client (though as stated in an update, I can reproduce this using a W10 client too). Netcat / nc -l 5357 sees the HTTP request (POST, HTTP client headers, blank line, and payload) in both versions. So kernel/userland event handoff is not the problem.

I'm a sysadmin, so the next thing I tried was using ktrace around wsdd. FreeBSD has truss and ktrace (on Linux there is strace which is awesome). truss is unreliable on FreeBSD, but ktrace works pretty well. I wanted to see if I could narrow down any differences in behaviour, file descriptor/socket I/O-wise.

The big difference was that dc32553 didn't show any sign of the POST request from the client:

$ egrep -B1 'POST.*HTTP/1.1' ktrace_d1c0d3e.txt
 76815 python3.7 GIO   fd 8 read 864 bytes
       "POST /2c2c4f89-43b2-5a0d-b73a-e622360b0358 HTTP/1.1\r
$ egrep -B1 'POST.*HTTP/1.1' ktrace_dc32553.txt
$

This just made me wonder what we did see, I/O-wise, from both versions. Rather than make this long comment even longer, I'll summarise: d1c0d3e saw the POST + HTTP server response. But dc32553 saw no such thing.

At this point, I had a deep suspicion that underlying socket behaviour might be the cause. I decided to look at what netstat said for SendQ/RecvQ (to see if there was any pending I/O) to any of the 3 relevant sockets. FreeBSD doesn't provide a convenient -p flag like on Linux, for matching against a PID, and sockstat does not provide SendQ/RendQ information so it's an uphill battle.

And sure enough: with dc32553, the client establishes a TCP connection with wsdd, but then the RecvQ gets stuck (non-zero). After 30 seconds exactly, the socket goes into CLOSED state (not TIME_WAIT or FIN_WAIT2). It is literally like the userland daemon is not aware that there is pending I/O on the file descriptor. That is where I think the problem lies, which circles back to my very first paragraph. :-) I can repeated this problem reliably 100% of the time, though sometimes the RecvQ number varies (864, 952, etc.):

$ sockstat -l | grep :5357
jdc      python3.7  78040 8  tcp4   192.168.1.51:5357     *:*
$ netstat -an | egrep 'Proto|5357'
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4     864      0 192.168.1.51.5357      192.168.1.50.50883     ESTABLISHED
tcp4       0      0 192.168.1.51.5357      *.*                    LISTEN
$ sleep 20 ; netstat -an | egrep 'Proto|5357'
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4     864      0 192.168.1.51.5357      192.168.1.50.50883     CLOSED
tcp4       0      0 192.168.1.51.5357      *.*                    LISTEN

I re-ran ktrace with -t+ to get as much syscall and lower-level detail as I could, because I was curious about what the state of the file descriptor was associated with the HTTP server on port 5357. As said, kqueue is involved, so I have to dig slowly and deep.

What I found is very strange. I've dumped the ktrace output using -E; the 3rd column is the number of seconds since the program has started:

 78614 python3.7 0.309700 CALL  F64  socket(PF_INET,0x10000001<SOCK_STREAM|SOCK_CLOEXEC>,IPPROTO_IP)
 78614 python3.7 0.309717 RET   F64  socket 8
 78614 python3.7 0.309742 CALL  F64  setsockopt(0x8,SOL_SOCKET,SO_REUSEADDR,0x7fffffffc21c,0x4)
 78614 python3.7 0.309754 RET   F64  setsockopt 0
 78614 python3.7 0.309766 CALL  F64  bind(0x8,0x7fffffffc1f8,0x10)
 78614 python3.7 0.309779 STRU  F64  struct sockaddr { AF_INET, 192.168.1.51:5357 }
 78614 python3.7 0.309790 RET   F64  bind 0
 78614 python3.7 0.309801 CALL  F64  getsockname(0x8,0x7fffffffc1f8,0x7fffffffc1f4)
 78614 python3.7 0.309812 STRU  F64  struct sockaddr { AF_INET, 192.168.1.51:5357 }
 78614 python3.7 0.309821 RET   F64  getsockname 0
...
 78614 python3.7 0.315925 CALL  F64  kevent(0x4,0x8025f75b0,0x1,0,0,0x7fffffffc378)
 78614 python3.7 0.315937 STRU  F64  struct kevent[] = { { ident=8, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0, data=0, udata=0x0 } }
 78614 python3.7 0.315949 STRU  F64  struct kevent[] = {  }
 78614 python3.7 0.315958 RET   F64  kevent 0
 78614 python3.7 0.315979 CALL  F64  listen(0x8,0x5)
 78614 python3.7 0.315993 RET   F64  listen 0

After this point, there is no other use of fd 8, other than its closure on program shutdown.

If we look at the kevent(2) man page we see that EVFILT_READ means ident is a file descriptor number. But the interesting part is this:

           Sockets which have previously been passed to
           listen() return when there is an incoming connec-
           tion pending.  data contains the size of the lis-
           ten backlog.

           Other socket descriptors return when there is
           data to be read, subject to the SO_RCVLOWAT value
           of the socket buffer.  This may be overridden
           with a per-filter low water mark at the time the
           filter is added by setting the NOTE_LOWAT flag in
           fflags, and specifying the new low water mark in
           data.  On return, data contains the number of
           bytes of protocol data available to read.

The order of syscalls pertaining to fd 8 is:

  1. socket() -- to create the server socket itself
  2. setsockopt() -- to make use of SO_REUSEADDR
  3. bind() -- bind the local server address to the socket (this is the 192.168.1.51:5357 binding)
  4. getsockname() -- to, for some reason, get the IP address and port previously bound to the fd (I don't know why)
  5. kevent() -- tries to set up the kernel-to-userland event for when there's an event on the socket itself. Except listen() hasn't been called yet!
  6. listen() -- tell kernel to accept connections on 192.168.1.51 port 5357, with a very small backlog (5; not sure why SOMAXCONN isn't being used here) -- but this is too late!

You can verify all of that here:

But d1c0d3e uses kqueue/kevent too, and it works htere. There, the listening fd is 7 (because newer wsdd has a raw route socket on fd 4, so all the numbers shifted up by 1). Let's do the same analysis:

 79231 python3.7 0.287239 CALL  F64  socket(PF_INET,0x10000001<SOCK_STREAM|SOCK_CLOEXEC>,IPPROTO_IP)
 79231 python3.7 0.287258 RET   F64  socket 7
 79231 python3.7 0.287283 CALL  F64  setsockopt(0x7,SOL_SOCKET,SO_REUSEADDR,0x7fffffffd34c,0x4)
 79231 python3.7 0.287295 RET   F64  setsockopt 0
 79231 python3.7 0.287307 CALL  F64  bind(0x7,0x7fffffffd328,0x10)
 79231 python3.7 0.287318 STRU  F64  struct sockaddr { AF_INET, 192.168.1.51:5357 }
 79231 python3.7 0.287330 RET   F64  bind 0
 79231 python3.7 0.287342 CALL  F64  getsockname(0x7,0x7fffffffd328,0x7fffffffd324)
 79231 python3.7 0.287355 STRU  F64  struct sockaddr { AF_INET, 192.168.1.51:5357 }
 79231 python3.7 0.287364 RET   F64  getsockname 0
...
 79231 python3.7 0.293796 CALL  F64  listen(0x7,0x5)
 79231 python3.7 0.293812 RET   F64  listen 0
 79231 python3.7 0.293845 CALL  F64  kevent(0x4,0x80643f270,0x1,0,0,0x7fffffffd928)
 79231 python3.7 0.293857 STRU  F64  struct kevent[] = { { ident=7, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0, data=0, udata=0x0 } }
 79231 python3.7 0.293869 STRU  F64  struct kevent[] = {  }
 79231 python3.7 0.293882 RET   F64  kevent 0

And here we see that listen() is called before kevent() on the listening fd, therefore things work..

Conclusion: it is wsdd doing this. There may be some "weird ordering nonsense" that Python is doing under the hood that you aren't familiar with that's probably buried deep in documentation.

Linux doesn't have kqueue/kevent -- it's equivalent of epoll(7). Both BSD and Linux have poll(2) and select(2). You could try switching to poll or select if you wanted, but epoll and kqueue are significantly more performant (read: they use way less CPU): https://docs.python.org/3/library/select.html

Edit: you're using selectors, so here's that doc too: https://docs.python.org/3/library/selectors.html -- note the example at the bottom and the order of the operations.

I assume you do not have a FreeBSD box to test this on. If you get a VM or something that runs it and verify this problem, this is what you'll need command-wise to help you out. You could verify this using curl -v http://x.x.x.x:5357/ -- it doesn't even matter that it's a GET /, the problem is still the same.

ktrace -t+ -i -f ktrace_d1c0d3e.out ./wsdd_d1c0d3e.py -w WORKGROUP -4 -i em0 -v -v -v
<another window> curl -v http://x.x.x.x:5357/
kdump -t+ -r -E -A -f ktrace_d1c0d3e.out > ktrace_d1c0d3e.txt
vi ktrace_d1c0d3e.txt

...and spend a lot of time looking closely at what's happening. :-)

christgau commented 4 years ago

Really good catch!

Conclusion: it is wsdd doing this.

I may use the quote "you're god damn right". ;-)

And here we see that listen() is called before kevent() on the listening fd, therefore things work..

This really changed between the offending commits. In the old code, the HTTP server object was created an then registered to the selector (whatever Python thinks is the best one on the OS it runs on): https://github.com/christgau/wsdd/blob/d1c0d3e31768646326a15d9eaef28ec9a3180023/src/wsdd.py#L782-L783

Object creation on this case includes binding and "activation" of the server. Activation here means... a call to listen.

However, in the refactored code, the register function is called on the server_bind: https://github.com/christgau/wsdd/blob/dc32553a9f8875fa0fb22525fd6bcf0875555cee/src/wsdd.py#L744-L749

This appears to cause the regression:

I assume you do not have a FreeBSD box to test this on.

I have a FreeBSD box here, but with recent 12.1. Strange enough that the most recent wsdd version works here even with the regression, but it also works with the fix I pushed into the issue-49 branch. So may give it (b7cf9a3) a try on your FreeBSD 11 box(es).

koitsu commented 4 years ago

I tried branch issue-49 -- the problem is gone. Awesome!

christgau commented 4 years ago

Thx for your feedback and, btw, I really appreciate the way how the issue was tackled down.

I bundled the fix commit together with another one for a bugfix release (v0.6.1).