dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

What are the reasons behind limiting the number of requested sockets to `65535` for Socket.Select()? #60930

Closed nsentinel closed 3 years ago

nsentinel commented 3 years ago

Can you please tell me what are the reasons behind limiting the number of requested sockets to 65535 for API Socket.Select():

public static void Select (System.Collections.IList? checkRead, System.Collections.IList? checkWrite, System.Collections.IList? checkError, int microSeconds);

I found this limitation existed in the classic .NET: Here and moved to Core and cross-platform sockets as is: Here

There is no mention of the limitation in the documentation.


The Windows implementation is based on the select function API (winsock2.h)

There is no mention of limitations in the documentation and if you study the SDK, they are also not obvious:

There is also document: Maximum Number of Sockets Supported

The maximum number of sockets supported by a particular Windows Sockets service provider is implementation specific. The Microsoft Winsock provider limits the maximum number of sockets supported only by available memory on the local computer.

Exact implementation in Windows Sockets Pal Layer also has no implied limits: Here


The Linux/Unix implementation is based on the Poll API

Rather than using the select syscall, we use poll. While this has a mismatch in API from Select and requires some translation, it avoids the significant limitation of select only working with file descriptors less than FD_SETSIZE, and thus failing arbitrarily depending on the file descriptor value assigned by the system. Since poll then expects an array of entries, we try to allocate the array on the stack, only falling back to allocating it on the heap if it's deemed too big.

Original Unix select API was limited to FD_SETSIZE (1024) which is also not the 16-bit limit.


So, what are the reasons behind limiting the number of requested sockets?

ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
Can you please tell me what are the reasons behind limiting the number of requested sockets to `65535` for API [Socket.Select()](https://docs.microsoft.com/en-us/dotnet/api/system.net.sockets.socket.select?view=net-5.0): ```csharp public static void Select (System.Collections.IList? checkRead, System.Collections.IList? checkWrite, System.Collections.IList? checkError, int microSeconds); ``` --- I found this limitation existed in the classic `.NET`: [Here](https://github.com/microsoft/referencesource/blob/master/System/net/System/Net/Sockets/Socket.cs#L2603) and moved to `Core` and cross-platform sockets as is: [Here](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.cs#L2193) There is no mention of the limitation in the documentation. --- The Windows implementation is based on the [select function API](https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select) (winsock2.h) There is no mention of limitations in the documentation and if you study the SDK, they are also not obvious: - `select` definition: [Here](https://github.com/tpn/winsdk-10/blob/master/Include/10.0.10240.0/um/WinSock2.h#L2048) - `fd_set` definition: [Here](https://github.com/tpn/winsdk-10/blob/master/Include/10.0.10240.0/um/WinSock2.h#L136) - `u_int` definition: [Here](https://github.com/tpn/winsdk-10/blob/master/Include/10.0.10240.0/um/WinSock2.h#L109) There is also document: [Maximum Number of Sockets Supported](https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2) > The maximum number of sockets supported by a particular Windows Sockets service provider is implementation specific. The Microsoft Winsock provider limits the maximum number of sockets supported only by available memory on the local computer. Exact implementation in `Windows Sockets Pal Layer` also has no implied limits: [Here](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Windows.cs#L886) --- The Linux/Unix implementation is based on the [Poll API](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs#L1725) > Rather than using the select syscall, we use poll. While this has a mismatch in API from Select and > requires some translation, it avoids the significant limitation of select only working with file descriptors > less than FD_SETSIZE, and thus failing arbitrarily depending on the file descriptor value assigned > by the system. Since poll then expects an array of entries, we try to allocate the array on the stack, > only falling back to allocating it on the heap if it's deemed too big. Original Unix [select API](https://man7.org/linux/man-pages/man2/select.2.html) was limited to `FD_SETSIZE` (`1024`) which is also not the 16-bit limit. --- So, what are the reasons behind limiting the number of requested sockets?
Author: nsentinel
Assignees: -
Labels: `area-System.Net.Sockets`, `untriaged`
Milestone: -
scalablecory commented 3 years ago

I can't think of a reason.

Are you running into this limit today? I'd recommend using async instead, as select() will be incredibly inefficient in many scenarios with that number of sockets.

nsentinel commented 3 years ago

Thanks for the answer.

I hit the limit while rewriting the async version of our code to get more fine-grained control.

A server capable of serving 250k+ connections unexpectedly stopped at 65k. The limitation can be overcome by splitting the socket list into chunks before the API call.

I try to make one dispatcher thread with Socket.Select() once per 200ms (when workers are idle) and worker threads (2 per core) to process received data.

I know about memory inefficiency (constant copy and remove sockets to/from the list) but the call rate is pretty low.

Are there other things to consider?

scalablecory commented 3 years ago

Interesting. I can see it being usable in some specific scenario like that. @antonfirsov @geoffkizer can you think of any reason for a limit here?

We're unlikely to invest in this change ourselves without more customers asking for it, but I don't think we'd reject a PR to raise the limit if were accompanied with appropriate tests.

nsentinel commented 3 years ago

I ran some additional tests under full load and concluded that you are right: select is too inefficient to use, even with a low call rate.

In the end, I decided to stop using it.

We can close this ticket. It seems to me that the reasons for the limitation in 65536 sockets lie in inefficiency of use (modification of the list consume too much CPU and memory), and not in any restrictions.

It seems that it makes sense to mention this in the documentation, and it is not necessary to remove the restriction.

geoffkizer commented 3 years ago

select is too inefficient to use, even with a low call rate.

I think this is the main issue.

I think it would be fine to increase the limit here, but in practice it's not really viable to use select this way.

scalablecory commented 3 years ago

Thanks for following up! Closing this now.