Open dbeinder opened 5 years ago
I got the same error also in 'high load' situation (10k+ connections and frequently changing).
2019-01-08 08:23:52,658 Unknown error (0xc0000001), Traceback= at System.Net.NetworkInformation.SystemIPGlobalProperties.GetAllTcpConnections() at System.Net.NetworkInformation.SystemIPGlobalProperties.GetActiveTcpConnections()
This error is not consistent, which makes it kinda annoying to troubleshoot.
public static int GetAvailablePort(int startingPort)
{
IPEndPoint[] endPoints;
List<int> portArray = new List<int>();
IPGlobalProperties properties = IPGlobalProperties.GetIPGlobalProperties();
//getting active connections
TcpConnectionInformation[] connections = properties.GetActiveTcpConnections();
portArray.AddRange(from n in connections
where n.LocalEndPoint.Port >= startingPort
select n.LocalEndPoint.Port);
...
I'm betting the error you're getting is an optimistic concurrency check. If so, it should be safe to call in a loop without negatively impacting the system. Are you able to try it and see?
Will throw out some feelers to get a more specific answer.
I have not been successful in triggering this behavior by creating artificial loads - but I have now started a test program that will keep polling until it gets the exception and will then count the number of retries and the time it takes on a server where I have seen the issue before.
It took 3 days of polling GetActiveTcpConnections() for the issue to crop up again. After detecting a 0xc0000001 exception, a repeat following immediately was successful.
So I guess there are two options to resolve this:
NetworkInformationException
to give a more actionable error message and mention in the .NET IPGlobalProperties documentation that this high load exception is possibleI believe the second option would simply lead to everyone copying boilerplate retry code whenever they want to use one of the IPGlobalProperties .GetXyzConnections()
functions. I can't think of a scenario when I wouldn't want to retry immediately but rather abandon the attempt to get those connections.
If we can get a slightly stronger guarantee from someone with WinApi knowledge that retrying is not going to end up in a endless loop, I think we should definitely opt for option 1.
If we add a loop, we'd have it give up after some tries, so the exception might still happen. Because of this, I think documentation needs to be updated either way.
That would depend on how GetTcpTable is implemented behind the scenes, if it is really just multiple cores competing fairly for access to the table, then the worst case chance of 'winning' may be as high as 1/NR_OF_CORES. In that case retrying without limit might be justified if it is all but guaranteed to work out after 4-16 attempts.
If it is more complicated than that, you're right a limit or better yet a timeout would become necessary - and the case for leaving retrying up to the user becomes much stronger.
If we do expand NetworkInformationException
to include a more specific flag for a high load condition - so that it isn't just an error string - that would have to be adapted on all the OS specific implementations and would make this a much larger fix.
if it is really just multiple cores competing fairly for access to the table, then the worst case chance of 'winning' may be as high as 1/NR_OF_CORES. In that case retrying without limit might be justified if it is all but guaranteed to work out after 4-16 attempts.
This assumes too much about the state of the system. Someone is winning, but without a fair ticketing system it might never be you. Load, priority, etc. might affect this.
+1 I have captured a full-mem dump.
Looking at the code, this makes some more sense:
If you receive this return code then calling the function again is usually enough to clear the issue and get the desired result. This return code can be a consequence of the system being under high load. For example, if the size of the TCP connection table changes by more than 2 additional items 3 consecutive times.
If true, we will need to add 3 new table entries at just the right moment, 3 times in a row to trigger this. However, the code doesn't actually seem to be returning STATUS_UNSUCCESSFUL
when this happens so I'm not 100% convinced this documentation is correct. Will work on a repro.
Triage: Looks like something we should document that we can throw out of that API IPGlobalProperties.GetActiveTcpConnections
.
Issues still exists in .NET 8.
When the Windows TCP connections table changes rapidly, the Windows API call GetTcpTable can return STATUS_UNSUCCESSFUL (0xc0000001). This return value is currently used by .NET to throw a generic NetworkInformationException which, when printed, only displays
Unknown error (0xc0000001)
.GetTcpTable
is used internally by .NET in the System.Net assembly:IPGlobalProperties.GetActiveTcpConnections()
IPGlobalProperties.GetActiveTcpListeners()
The WinAPI documentation for
GetTcpTable
is short on guarantees, but recommends simply repeating the call. Can someone with deeper knowledge of the Windows API comment if it would be safe for .NET to retry until the call succeeds? If that risks major lock-ups, NetworkInformationException should be extended to give users a flag to allow them to implement their own retry logic. For now, this can be done by checking theErrorCode
of the exception manually against against 0xc0000001.The same issue exists in .NET Core and .NET Framework, since the code is virtually identical.
Curiously, the WinApi documentation does not mention this behaviour for similar calls, like
GetExtendedTcpTable
,GetUdpTable
,GetExtendedUdpTable
which are also used by .NET - again, it would be great to know if these functions are safe, or if it is simply an oversight in the documentation.