Yortw / RSSDP

Really Simple Service Discovery Protocol - a 100% .Net implementation of the SSDP protocol for publishing custom/basic devices, and discovering all device types on a network.
http://yortw.github.io/RSSDP/
MIT License
282 stars 66 forks source link

Exception: A socket operation was attempted to an unreachable host #75

Closed ramondeklein closed 5 years ago

ramondeklein commented 6 years ago

We have some systems that show the A socket operation was attempted to an unreachable host message and crash the application. It starts with the following exception:

System.Net.Sockets.SocketException (0x80004005): A socket operation was attempted to an unreachable host
   at System.Net.Sockets.Socket.SendTo(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, EndPoint remoteEP)
   at Rssdp.UdpSocket.SendTo(Byte[] messageData, UdpEndPoint endPoint)
   at Rssdp.Infrastructure.SsdpCommunicationsServer.<>c__DisplayClass22_0.<SendMulticastMessage>b__0()
   at Rssdp.Infrastructure.SsdpCommunicationsServer.Repeat(Int32 repetitions, TimeSpan delay, Action work)
   at Rssdp.Infrastructure.SsdpCommunicationsServer.SendMulticastMessage(Byte[] messageData)
   at Rssdp.Infrastructure.SsdpDevicePublisherBase.SendAliveNotification(SsdpDevice device, String notificationType, String uniqueServiceName)
   at Rssdp.Infrastructure.SsdpDevicePublisherBase.SendAliveNotifications(SsdpDevice device, Boolean isRoot)
   at Rssdp.Infrastructure.SsdpDevicePublisherBase.SendAllAliveNotifications(Object state)
   at System.Threading.TimerQueueTimer.CallCallbackInContext(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()
   at System.Threading.TimerQueue.AppDomainTimerCallback()

This exception cannot be handled by user code, because it is running from a timer. Our application is running as a service that automatically restarts (after 90 seconds). When the service restarts, then the following exception occurs when the SsdpDevicePublisherBase constructor is invoked:

System.Net.Sockets.SocketException (0x80004005): A socket operation was attempted to an unreachable host
   at System.Net.Sockets.Socket.setMulticastOption(SocketOptionName optionName, MulticastOption MR)
   at System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Object optionValue)
   at Rssdp.SocketFactory.CreateUdpMulticastSocket(String ipAddress, Int32 multicastTimeToLive, Int32 localPort)
   at Rssdp.Infrastructure.SsdpCommunicationsServer.ListenForBroadcastsAsync()
   at Rssdp.Infrastructure.SsdpCommunicationsServer.BeginListeningForBroadcasts()
   at Rssdp.Infrastructure.SsdpDevicePublisherBase..ctor(ISsdpCommunicationsServer communicationsServer, String osName, String osVersion)
   at Rssdp.SsdpDevicePublisher..ctor()

We always invoke the SsdpDevicePublisherBase with a ISsdpCommunicationsServer that is initialized like this: new SsdpCommunicationsServer(new SocketFactory("0.0.0.0"), 2869); Our service keeps restarting, but after a while it starts working again and everything seems to be fine again.

There are some environmental issues that might have impact:

Unfortunately, I don't have an exact reproduction scenario, but it happens on multiple systems (that are in different LANs).

Yortw commented 6 years ago

Thanks for reporting this.

It's not something I've seen/reproduced so far but I will see what I can do. In the short term I'll definitely try to fix the error handling so this doesn't crash the process. Unfortunately my current living and work arrangements are not particularly conducive to OSS work right now so if you need this in a hurry you may want to fork and submit a PR back to me, using your own fork in the meantime.

I'm a bit confused about the error. It may have something to do with the other SSDP services, but that seems odd given it sounds like it works initially and fails sometime later. If there was a conflict with the ports/sockets I would expect it to fail earlier/at start up. Also it's interesting to get that specific error message on the multicast socket, given that it doesn't represents a single/specific host - I wonder what host it is failing to connect to? Possibly something is going wrong with the local adapter it is bound to? Or maybe the error message is misleading? I've tried a Google & Bing search, but I have not found anything specifically relevant to this. Most people suffering this error are not getting it in relation to multicast sockets, the only one who was had an issue around understanding IPv6 vs 4 addresses, but I do not believe that to be the case here.

The next time it happens can you check the state of the local adapter it should be bound to? Does it have a valid IP, is it online, did it just reconnect or anything weird like that?

Also you say that when the process restarts it fails again with a similar error, does it ever recover or do you need to reboot the machine or do something else to get it to start properly?

Thanks.

Yortw commented 6 years ago

Sorry. could you please provide the following?

Thanks.

Yortw commented 6 years ago

Hmmm, this is odd. The latest code already has exception handling in the SendAllAliveNotifications method which is called by the timer. That should report an error via the logger (if one has been configured), then dispose the publisher. It should not crash the process (though publishing will stop).

Can you confirm you're running the latest version of the package?

ramondeklein commented 6 years ago

I needed to check the systems at work, so I needed a few days. The crash during sending the multicast message happened with an old RSSDP v2.0.0.5 release. We also have systems with the (latest) RSSDP v3.5.8 and I haven't seen the broadcast crash, but I haven't wired your RSSDP logging to our logging, but I will do so.

I did see the CreateUdpMulticastSocket exception with the v3.5.8 release. This is the stack-trace when it happens on our system:

2017-12-13 10:42:18,198 [  1] ERROR Riwo.Common.Utility.TaskHelper: Could not complete (ignored) task System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult]
System.ApplicationException: Ignored Exception from member: .ctor (file: C:\BuildAgent\work\c563973bd42eed3c\Riwo.Rimote.ClientService\Service.cs, line: 61) ---> System.AggregateException: Er zijn één of meer fouten opgetreden. ---> System.Net.Sockets.SocketException: Er is geprobeerd een socketbewerking uit te voeren op een onbereikbare host
   bij System.Net.Sockets.Socket.setMulticastOption(SocketOptionName optionName, MulticastOption MR)
   bij System.Net.Sockets.Socket.SetSocketOption(SocketOptionLevel optionLevel, SocketOptionName optionName, Object optionValue)
   bij Rssdp.SocketFactory.SetMulticastSocketOptions(Socket retVal, Int32 multicastTimeToLive)
   bij Rssdp.SocketFactory.CreateUdpMulticastSocket(Int32 multicastTimeToLive, Int32 localPort)
   bij Rssdp.Infrastructure.SsdpCommunicationsServer.BeginListeningForBroadcasts()
   bij Rssdp.Infrastructure.SsdpDevicePublisherBase..ctor(ISsdpCommunicationsServer communicationsServer, String osName, String osVersion, ISsdpLogger log)
   bij Riwo.Rimote.Client.RimoteDevicePublisher..ctor()
   bij Riwo.Rimote.Client.ClientRimoteDevice.<StartAdvertisingAsync>d__52.MoveNext()
   --- Einde van intern uitzonderingsstackpad ---
   --- Einde van intern uitzonderingsstackpad ---

I will wire RSSDP logging to our own logging mechanism and I will log the network state when the sockets are bound. I hope this will give some additional information.

We are running on Windows 8 (v6.3 build 9600) and Windows 10 (1607) and use .NET v4.6. I will get back to you when I have more information.

lalernehl commented 5 years ago

Any resolution on this? I've been running into this issue constantly. Not sure what causes it though. I'm running on a Windows machine. Any information you may need i can provide to you.

Thank you!

Yortw commented 5 years ago

I'm afraid I still haven't been able to reproduce the issue, and with the error handling already in place I can't see why it's happening.

Based on a search of the error message it looks like this might be caused by an underlying network problem, maybe DNS issues, or just the network becoming inaccessible (i.e wifi dropping out), or possibly the IP address of the machine where the service is running changing? It's hard to know, but all of those are potentially things that wouldn't show up in my local testing here (assuming my network doesn't have the same issue). That would also explain why on a restart another socket error occurs (if the network is still down/invalid local address used etc).

That said, the error handling in place should prevent the exception being thrown in the background and crashing the process, even if publishing would stop at that point.

Are you able to hook up the RSSDP logging engine to your own logging, and capture a record of the event happening? The log from before and including the event may reveal something, I hope.

ramondeklein commented 5 years ago

I have found that this issue only pops up directly after a reboot. Our service is registered to start during boot with start-type "automatic". When I set the startup type of the service to "automatic (delayed)" then it works fine. This is the log that shows when the socket cannot be created:

2019-05-09 10:00:25,373 [  9] ERROR XXX.Common.Logging.Log4NetLogger: Unable to start SSDP advertisement.
2019-05-09 10:00:25,373 [  9] ERROR XXX.Common.Logging.Log4NetLogger: - Exception:
System.Net.Sockets.SocketException (0x80004005): Er is geprobeerd een socketbewerking uit te voeren op een onbereikbare host
   bij System.Net.Sockets.Socket.setMulticastOption(SocketOptionName optionName, MulticastOption MR)
   bij Rssdp.SocketFactory.CreateUdpMulticastSocket(Int32 multicastTimeToLive, Int32 localPort)
   bij Rssdp.Infrastructure.SsdpCommunicationsServer.BeginListeningForBroadcasts()
   bij Rssdp.Infrastructure.SsdpDevicePublisherBase..ctor(ISsdpCommunicationsServer communicationsServer, String osName, String osVersion, ISsdpLogger log)
   bij XXX.Client.DevicePublisher..ctor()
   bij XXX.Client.ClientDevice.<AdvertiseAsync>d__70.MoveNext()
--- Einde van stacktracering vanaf vorige locatie waar uitzondering is opgetreden ---
   bij System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   bij System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   bij XXX.ClientService.SsdpService.<AdvertiseDeviceAsync>d__5.MoveNext()

My service currently depends on the Http service and LanmanServer service. I need the HTTP service, because I have an HttpListener in my application and I was hoping that the network layer would have been complete initialized afther the LanmanServer was ready. When I also add the Network Store Interface Service as a dependency, then this issue is gone.

I don't think this is an actual bug in RSSDP, but the network is simply not available yet. I also notice some other network glitches directly after startup. My services runs on the 192.168.87.0/24 network and I have also seen messages that this network isn't available yet. After 1-3 seconds, then this message disappears and the connection works fine.

PS: Error handling is fine, because I get an exception during the construction of the device publisher...

Yortw commented 5 years ago

Hi,

Thanks for following up on this and reporting back, that's really great. I've been thinking about it, but I agree with you that this sort of problem seems best handled by the application code rather than the library. The application is in a better place to know how long it wants to wait/retry, what dependencies there might be, or if it even runs shortly after boot (not everything using RSSDP will). I'd like for the library to work as well as possible out of the box and 'lead people into the pit of success' , but I don't think this particular case is something the library should handle itself.

Great to have the issue understood and documented though. Are you ok if we close this issue now?

Thanks again.