Open shirhatti opened 3 years ago
Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.
Author: | shirhatti |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Net.Sockets` |
Milestone: | - |
On Windows, there is already an API that wraps WSADuplicateSocket, but unfortunately it duplicates and closes the socket. We'd need an API that just duplicates the socket.
Why do you need one that just duplicates the socket?
The child process may die or get restarted. I need to have a valid handle to original socket so I can duplicate and handoff again to the new child process.
- public Socket DuplicateSocketLinux(int targetProcessId)
What are you proposing this do with the process ID?
Ahh copypasta. No no need for a PID for DuplicateSocketLinux. I'll update the issue text
I believe the Windows implementation of this has a serious limitation when IOCP is involved -- @antonfirsov do you remember what we ran into when reimplementing this?
This is an expensive feature in the proposed form, I suggest to explore alternatives.
On Windows, there is already an API that wraps WSADuplicateSocket, but unfortunately it duplicates and closes the socket. We'd need an API that just duplicates the socket.
I think the reason why DuplicateAndClose
has been designed this way originally back in Framework times, is that allowing the manipulation of the socket from 2 processes simultaneously may lead to side effects in the managed implementation that are hard to control and debug, rendering a potential non-closing Duplicate
variant a very unsafe operation. Personally, I believe this was a reasonable decision.
I believe the Windows implementation of this has a serious limitation when IOCP is involved -- @antonfirsov do you remember what we ran into when reimplementing this?
The OS limitation is that you can not do IOCP on the same socket from two processes, which means that you should not do async operations in the originating process (before DuplicateAndClose
-ing the socket), or the other way: you can only do sync stuff in the destination process. I don't think we want to re-introduce an inferior, non-IOCP async engine in our Windows PAL just to remove this limitation.
The MS team that asked for the reintroduction of DuplicateAndClose
in .NET Core (#1760) had a similar scenario as described in the OP, and they were fine with both of the limitations described above (1/ Close in original proc 2/ No async in the original proc). What they did differently is that they did not duplicate the listener socket (while keeping it alive in the original process), but rather sent the accept sockets down to child processes to do the sends/receives.
@shirhatti can't you consider a such an architecture instead? Can you provide a more detailed description of the ASP.NET scenario?
Based on your implementation, I guess you rather meant:
- public Socket DuplicateSocketLinux();
+ public SafeSocketHandle DuplicateSocketLinux();
@wfurt how does this (an API exposing libc dup
) compare to #932? Are you aware of any side effects?
public Socket DuplicateSocketLinux()
As with a previous discussion around making handles non-inheritable, this shouldn't need to be socket-specific. If we need to expose this capability, I would much prefer to do so for arbitrary SafeHandles and in a cross-platform manner, e.g.
public abstract class SafeHandle
{
public static TSafeHandle Duplicate<TSafeHandle>(TSafeHandle handle) where TSafeHandle : SafeHandle, new();
public static void SetInheritability(SafeHandle handle, bool inheritable);
... // etc.
}
Very interesting feature, having implemented that in a production system back in netfx.
The OS limitation is that you can not do IOCP on the same socket from two processes, which means that you should not do async operations in the originating process (before DuplicateAndClose-ing the socket), or the other way: you can only do sync stuff in the destination process. I don't think we want to re-introduce an inferior, non-IOCP async engine in our Windows PAL just to remove this limitation.
I believe the part about IOCP migration is untrue, as of many years/Windows versions, no? The kernel does allow for unbinding of objects from an IOCP and re-binding elsewhere. Of course, this requires extra care.
This would allow for netcore's existing IOCP-only implementation to work across processes.
Side question: wouldn't this feature also require exporting SslStream material across processes? Or is that out of scope and remains HTTP-only?
Even if it's possible to unbind the socket from a Completion Port without side effects, it's still not possible to actively use it with IOCP from two processes simultaneously, which seems essential for the DuplicateSocketWindows(pid)
proposal.
What they did differently is that they did not duplicate the listener socket (while keeping it alive in the original process), but rather sent the accept sockets down to child processes to do the sends/receives.
+1, this is how I've seen this sort of thing done typically.
The dup
only clones the handle within given process @antonfirsov . AFAIK using UDS is only one way how to pass it to different process on Unix.
While the underlying mechanism is different, I feel it would be nice if the API is somewhat generic e.g. hides the complexity. It is one part to send/duplicate the handle but how will that surface in the child process? Let say somebody sends several handles over, how one will match them?
I do like the idea of marking handle as inheritable. That did come up few times before. Doing some work before child starts is easiest in many cases IMHO.
This would allow for netcore's existing IOCP-only implementation to work across processes.
That's what I'm seeing in my prototype, Kestrel works just when listening on a duplicated socket
I believe the part about IOCP migration is untrue, as of many years/Windows versions, no? The kernel does allow for unbinding of objects from an IOCP and re-binding elsewhere. Of course, this requires extra care.
I haven't tried to stressing it, but the naïve implementation seems to work fine.
Side question: wouldn't this feature also require exporting SslStream material across processes? Or is that out of scope and remains HTTP-only?
Nope. We aren't calling Accept on the socket in the parent process, so the TLS handshake hasn't yet begun. The entire handshake occurs in the child process.
Even if it's possible to unbind the socket from a Completion Port without side effects, it's still not possible to actively use it with IOCP from two processes simultaneously, which seems essential for the
DuplicateSocketWindows(pid)
proposal.
That isn't my intention. I only intend to have one process actively use the socket at a time. When the child process dies or gracefully shuts down, all IOCP object are unbound. The parent then launches a new child process and gives it a handle to the socket. The parent process will never have any bound IOCP handles in this case.
@shirhatti can you provide more/more formal details about your scenario and planned architecture (what are the processes and their roles, what do the sockets do in the processes etc.)? We need to understand why isn't the DuplicateAndClose
approach (duplicating accepts sockets) an option for you. As @geoffkizer also mentioned in his https://github.com/dotnet/runtime/issues/48637#issuecomment-784383200, it worked fine for similar use cases in the past.
DuplicateAndClose
would require the child process to be cooperative and hand back the socket to the parent process before shutting down. At least in my use case (used in a development server), I expect the child process to crash occasionally and not hand back the socket.
By not closing, the socket remains bound to even when the child process crashes. When the child process crashes, I can always Duplicate the socket again in the parent and handoff to a new child process.
@shirhatti what if you listen in the parent process, and hand off the accept sockets instead of passing the listen socket? Would it change things?
Very much so, Kestrel doesn't support that 😄
I'll let @shirhatti describe the thing we're currently trying to accomplish but handing off the listen socket is also how systemd works (on linux). This is isn't a completely foreign idea (e.g. https://leonardoce.wordpress.com/2015/03/08/systemd-socket-based-activation/)
@shirhatti is it tied to existing .NET 6.0 theme? What's the priority? Can we link it from there?
We're considering this as part of the inner-loop theme because we want to switch the default VS dev experience from using IIS Express to using Kestrel which should enable faster startup. We want this feature before making the switch from IIS Express to avoid connection refused errors while the process is starting. See https://github.com/dotnet/aspnetcore/issues/27277.
Triage: Per offline info, it is not committed for 6.0 or blocking. If it is needed, there is workaround to PInvoke directly.
On Linux we should solve it in general on SafeHandle
rather than specific to Networking -- it will be cross-platform solution (though only Linux will be usable in this scenario).
On Windows, we will need the Networking API.
Moving to Future for now.
If it is needed, there is a workaround to PInvoke directly.
Acknowledging this.
Background and Motivation
As part of the hot reload scenarios in .NET 6, ASP.NET Core is looking at moving the socket creation code to an external process. Having the listen socket bound to in a separate process allows for inner-loop improvements by potentially parallelizing operations in the startup path since the OS will hold connections prior to your application calling accept. For e.g., you do not need to wait for the application to successfully start before launching your browser.
To allow binding to a socket in a different process than the listener would require a mechanism to pass the socket between processes. Normally this achieved by just forking the process and inheriting the file descriptors. Unfortunately, implementing this in .NET is slightly challenging since the
O_CLOEXEC
flag is set on all Sockets. Providing an API to duplicate sockets would make this is a lot easier.On Windows, there is already an API that wraps
WSADuplicateSocket
, but unfortunately it duplicates and closes the socket. We'd need an API that just duplicates the socket.Proposed API
I don't actually propose naming these
DuplicateWindows
andDuplicateLinux
, but I couldn't think of good suggestionsUsage Examples
A fully fleshed example that uses the Windows and Linux variant:
Linux: https://github.com/shirhatti/zocket/blob/main/src/zocket/Program.cs#L52 Windows: https://github.com/shirhatti/zocket/blob/main/src/zocket/Program.cs#L90
Alternative Designs
I proposed duplicating (and not setting
O_CLOEXEC
on the duplicated socket) as opposed calling fcntl on the original socket to preserve some semblance of symmetry between the Windows and Linux use cases.This is common feature of development server in other language ecosystem: Werkzeug, Lithos, systemfd
Risks
Improper usage of this API could result in leaking file descriptors, but it's hard to mitigate that since the goal of this API is intentional leakage of the file descriptor.
cc @jkotalik @pranavkm