dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

Cannot create AF_VSOCK Sockets due to overly strict SocketPal #58327

Open fbrosseau opened 3 years ago

fbrosseau commented 3 years ago

Description

Hello,

Unix SocketPal prevents managed code from creating AF_VSOCK (and any other new and/or non-standard AF's). This is unfortunate, because at the managed level, everything is there to extend for new address families: You write your EndPoint-derived implementation properly, and then it gets serialized to a SocketAddress, and that gets passed down to socket for the platform, and everything works. That is the case for Windows, where dotnet does not attempt to do anything with the parameters - if WinSock is happy with it, it works.

In Linux, SocketPal needs to do some translations, likely because some of the basic values do not match between their original Windows constants and the standard Linux ones. In its current form, the code there bails out entirely if anything is unknown:

https://github.com/dotnet/runtime/blob/cffaa78235ea93d5e3eeb56956579df503e11250/src/libraries/Native/Unix/System.Native/pal_networking.c#L2461-L2487

My exact use case is for AF_VSOCK. In this specific case, I guess the simplest fix would be to add AF_VSOCK to TryConvertAddressFamilyPalToPlatform. However, should Linux SocketPal allow for arbitrary unknown AFs, like Windows does? That's a larger change, and that can come with complications if the values keep diverging over time between Linux and Windows.

Regression

No, pal_networking.c blame looks old over there.

Workaround

Pinvoke socket(), connect() / bind() / ... (basically anything that deals with EndPoint classes), and then new Socket(new SafeSocketHandle((IntPtr)sock, true)); to leverage the actual Socket/IO benefits provided by .net.

This workaround has multiple drawbacks - the current managed code in Socket gets all confused and assumes that the socket has Connected=false, due to the code only handling IPv4/IPv6/Unix in the SafeHandle constructor. Having Connected=false prevents it from using NetworkStream, for example. Then, if a Stream interface is required, the workaround is to instead use a FileStream over the sockfd. This is really not ideal : )

@antonfirsov

ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Hello, Unix SocketPal prevents managed code from creating AF_VSOCK (and any other new and/or non-standard AF's). This is unfortunate, because at the managed level, everything is there to extend for new address families: You write your EndPoint-derived implementation properly, and then it gets serialized to a SocketAddress, and that gets passed down to `socket` for the platform, and everything works. That is the case for Windows, where dotnet does not attempt to do anything with the parameters - if WinSock is happy with it, it works. In Linux, SocketPal needs to do some translations, likely because some of the basic values do not match between their original Windows constants and the standard Linux ones. In its current form, the code there bails out entirely if anything is unknown: https://github.com/dotnet/runtime/blob/cffaa78235ea93d5e3eeb56956579df503e11250/src/libraries/Native/Unix/System.Native/pal_networking.c#L2461-L2487 My exact use case is for AF_VSOCK. In this specific case, I guess the simplest fix would be to add AF_VSOCK to `TryConvertAddressFamilyPalToPlatform`. However, should Linux SocketPal allow for arbitrary unknown AFs, like Windows does? That's a larger change, and that can come with complications if the values keep diverging over time between Linux and Windows. ### Regression No, pal_networking.c blame looks old over there. ### Workaround Pinvoke `socket()`, `connect()` / `bind()` / ... (basically anything that deals with EndPoint classes), and then `new Socket(new SafeSocketHandle((IntPtr)sock, true));` to leverage the actual Socket/IO benefits provided by .net.
Author: fbrosseau
Assignees: -
Labels: `area-System.Net.Sockets`, `untriaged`
Milestone: -
antonfirsov commented 3 years ago

I guess the simplest fix would be to add AF_VSOCK to TryConvertAddressFamilyPalToPlatform

For that we also need a managed AddressFamily entry for AF_VSOCK. If VSOCK is an important use case for you, I recommend to file an API proposal adding that specific address family. Assuming we accept the proposal, to me it looks like an up-for-grabs candidate.

the current managed code in Socket gets all confused and assumes that the socket has Connected=false, due to the code only handling IPv4/IPv6/Unix in the SafeHandle constructor.

If understand it correctly, this is due to a condition check before we call getpeername on the socket handle: https://github.com/dotnet/runtime/blob/e0e7bfce6c0841bb659810cd80bce87e4cf7329b/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.cs#L179-L186

I'm wondering what would happen if we just removed this criteria to unblock scenarios dealing with connected sockets of arbitrary address families? _remoteEndPoint would remain unassigned in case of arbitrary address families, but send/receive operations should still work. @geoffkizer @tmds thoughts?

davidfowl commented 3 years ago

cc @shirhatti

fbrosseau commented 3 years ago

For that we also need a managed AddressFamily entry for AF_VSOCK. If VSOCK is an important use case for you, I recommend to file an API proposal adding that specific address family. Assuming we accept the proposal, to me it looks like an up-for-grabs candidate.

From a framework perspective I guess this makes the most sense, but I was hoping this issue could be handled 100% internally at the PAL level so that it is easier to pass. I can submit an API proposal in parallel. At that point, would it only be the AddressFamily entry, or would it be AddressFamily + EndPoint impl? And AF_VSOCK only, or AF_HYPERV as well? Note that AF_HYPERV is not a problem, unlike VSOCK, because on Windows it is possible to write a fully functional EndPoint implementation and AF_HYPERV works completely in managed. But providing the implementation would add convenience.

What I mean by handling it internally at the PAL is that could it be changed so that unknown values flow through to the native calls instead of early-returning an error? Right now it strictly enforces well-known values to guard against invalid parameters and sizes, but the native call would already return an adequate error if it does not like the parameters.

If understand it correctly, this is due to a condition check before we call getpeername on the socket handle:

Yes. My understanding of that code is that relaxing those conditions would work.

antonfirsov commented 3 years ago

or would it be AddressFamily + EndPoint impl

In #28636 (https://github.com/dotnet/corefx/pull/37315) we shipped a few new address families without EndPoint implementations so it's probably not necessary, meaning that the minimal API would just focus on unblocking users instead of delivering a full solution. /cc @wfurt

or AF_HYPERV as well

To me it sounds reasonable to file a shared proposal for these two.

wfurt commented 3 years ago

I did some work on Netlink and It was working for me - at least the small part I was using. Supporting completely new protocols blindly is not trivial. As you noticed there are fragments in the socket code making certain assumptions. Ability to create Socket from handle was added to solve also some other use cases. Creating Socket from RAW AddressFamily was discussed but rejected as it may have still have some other lingering issue. For one, it may still have issue not knowing if socket is or is not Connected. Existing code is guessing and perhaps that may be improved. Alternatively we may pass that information to the constructor.

Can you share your code @fbrosseau primarily the endpoint, creation and example oof what is not working?

fbrosseau commented 3 years ago

I did some work on Netlink and It was working for me

In Linux? Windows PAL allows for arbitrary AFs, but not Linux.

Can you share your code @fbrosseau primarily the endpoint, creation and example oof what is not working?

new Socket((AddressFamily)40, SocketType.Stream, 0);

-->
Unhandled exception. System.Net.Sockets.SocketException (97): Address family not supported by protocol
   at System.Net.Sockets.Socket..ctor(AddressFamily addressFamily, SocketType socketType, ProtocolType protocolType)
   at AfHyperV.Program.Main(String[] args)
new SocketAddress((AddressFamily)40, 32);

-->

Unhandled exception. System.PlatformNotSupportedException: Operation is not supported on this platform.
   at System.Net.SocketAddressPal.ThrowOnFailure(Error err)
   at System.Net.SocketAddressPal.SetAddressFamily(Byte[] buffer, AddressFamily family)
   at System.Net.SocketAddress..ctor(AddressFamily family, Int32 size)
   at AfHyperV.Program.Main(String[] args) 

Anything that touches the AddressFamily throws. I can share a full example of what would be a correct VSockEndPoint implementation, but basically every step is a blocker :)

Note that these errors are generated by the Unix PAL itself, not the OS. The OS would accept these parameters happily. See the part of pal_networking.c that I linked in OP

wfurt commented 3 years ago

Yes, on Linux. Windows does not have Netlink. And this is not what I meant. I know casting number to AddressFamily will not work. It is about creating handle you self, creating Socket from it and using your EndPoint for Bind().

fbrosseau commented 3 years ago

Ah sorry! Yes, well managed Socket.Bind/.Connect is a no-go due to the SocketAddress constructor throwing, and I do not think there is any possible way to implement EndPoint.Serialize without it. So that step too must be done through PInvoke. But then due to the defensive code in the Socket constructor that takes a handle, _rightEndPoint is never considered if the AF is not IPv4/IPv6/Unix.

wfurt commented 3 years ago

Take a look at https://github.com/dotnet/runtime/issues/26416. It has example of managed Bind with custom AddressFamily. I think the sequence should be: 1) create handle with P/Invoke. 2) Create Socket from it. 3) Use Socket.Bind if needed.

This should all work AFAIK. Looking back, I don't think I ever tried Connect since neither CAN nor Netlink is stream oriented. If Bind works but Connect does not it I think that would be something we can look into. I can probably also find my old Netlink prototype. It did not do Connect but it was using SendTo as far as I remember.

fbrosseau commented 3 years ago

LLEndpoint uses new SocketAddress(AddressFamily.Packet, 20); which has first-class support in TryConvertAddressFamilyPalToPlatform and hence it works, but anything outside of the hardcoded values will return false from there and throw once back in managed.

wfurt commented 3 years ago

Than look at https://github.com/dotnet/runtime/blob/ebc0f05ebc3bb8854dceaf791486b4aa49739e80/src/libraries/System.Net.Sockets/tests/FunctionalTests/CreateSocketTests.cs#L584

Original addition of Netlink was reverted (e.g. there is neither PAL nor enum) but that test can still dump routing table. I think the trick that makes it work is here: https://github.com/dotnet/runtime/blob/ebc0f05ebc3bb8854dceaf791486b4aa49739e80/src/libraries/System.Net.Sockets/tests/FunctionalTests/CreateSocketTests.cs#L519-L520

Netlink is one example were even if we added AddressFamily use of it was still problematic. That was my story of adding support for unknown/arbitrary protocols.

fbrosseau commented 3 years ago

Ah yes interesting, I didn't realize SocketAddress could be fuzzed into changing the AF byte, this goes against both the docs and the code comments, so I did not expect this to be a supported scenario.

https://github.com/dotnet/runtime/blob/9671726c0a79d5f4cab320b32da1cd4463a0d1be/src/libraries/Common/src/System/Net/SocketAddress.cs#L55-L59

Then yes if that's supported that works, that removes the need for quite a bit of PInvoke, except the initial socket() call.

wfurt commented 3 years ago

It would be probably cleaner if you can use AddressFamily.Unknown as base but that will probably fail as that does not really represent any OS protocol. One challenge with supporting unknown protocols is lack of uses case and difficulty with writing tests. If you can get something working we can perhaps take that it as feedback and make improvements for future @fbrosseau. Trouble with Netlink was fact that it also needed specific ProtocolType and there are many and user can possibly also register and create one dynamically. Perhaps we can look at it again in 7.0 and make process of adding new protocols easier.

fbrosseau commented 3 years ago

AFAIK VSock is mainstream enough now and it also supports "loopback" that does not require having any actual VM in the system, so this is probably a good testable candidate AF.

wfurt commented 3 years ago

Yah, you can argue that about Netlink as well. Part of the trouble is that .NET in not particularly well setup to handle platform specific features e.g. works only on Linux.

antonfirsov commented 3 years ago

Part of the trouble is that .NET in not particularly well setup to handle platform specific features e.g. works only on Linux.

That's true, but we already have similar address families (eg. AF_PACKET), so I don't see an issue adding more of them. We should just point out in the docs, that the given enum value is platform-specific.

I think this is a good candidate to be a community-driven feature, my recommendation is still to

Independently, we should also evaluate if we can improve the support of using arbitrary AddressFamilies / ProtocolTypes through P/Invoke and custom endpoints. I think we can provide limited support for advanced users without adding new API-s or implementing high impact changes like the eliminating strict PAL "mappability" criterias entirely, by doing just 2 things:

  1. Relax the criteria only in SocketAddress(AddressFamily) constructor to actually enable the use of EndPoints with custom address families.
  2. Remove the _rightEndPoint != null check in Socket(SafeSocketHandle) as mentioned in my https://github.com/dotnet/runtime/issues/58327#issuecomment-908451293 so connected sockets created in unmanaged code act like they are actually connected.
fbrosseau commented 3 years ago

Sure I could look into that.

Points 1 & 2

Could the TryConvertAddressFamilyPalToPlatform condition be relaxed in SystemNative_Socket too? The code already does not do anything with the result other than passing it to socket(), so if SystemNative_Socket did not return with error on TryConvertAddressFamilyPalToPlatform=false, even that would no longer require PInvoke. AFAICT with those changes + this one, zero PInvoke would then be required for anything to have a functioning custom AF.

tmds commented 3 years ago

Could the TryConvertAddressFamilyPalToPlatform condition be relaxed in SystemNative_Socket too?

You want to use AddressFamily to pass in raw values. For values that match defined address families, the PAL has no way of knowing whether it should convert them or treat them as raw.

We need some API addition that indicates whether the values are raw ones. These could be some overload that accepts ints, similar to SetRawSocketOption vs SetSocketOption.

Remove the _rightEndPoint != null

_rightEndPoint is the way Socket tracks the type of EndPoint it should use. For unknown address families, a user may want to specify this. Or maybe we want to add a RawEndPoint that exposes the raw address data.

API could be something in this direction:

const int AF_VSOCK = 40;
const int SOCK_STREAM = 1;
using var socket = new Socket(AF_VSOCK, SOCK_STREAM, 0, new MyVSockEndPoint());
fbrosseau commented 3 years ago

You want to use AddressFamily to pass in raw values. For values that match defined address families, the PAL has no way of knowing whether it should convert them or treat them as raw.

I am not sure I can see the concrete problem - the already existing special-cases (Packet, CAN) already have synthetic values on purpose to not clash with anything standard. So that part is fine. Then, for the normal stuff (IPv4 etc), the values are universal. For the AF values that do differ between platforms (VSock and HyperV are examples), the OS would already tell if the values are good or bad.

API could be something in this direction:

First argument would be redundant with the endpoint argument since the AF value is already part of the EndPoint contract. I am also not sure if having an endpoint at socket()-time would make sense for all families. Also would that be the remote or local endpoint? Their final values are often unknown until after bind-time, or even lazier at connect-time with port-sharing options, for example.

antonfirsov commented 3 years ago

@tmds API addition is a very deep rabbit hole, the scope seems to be much bigger here than with SetRawSocketOption. It's not enough to just expose Socket(int, int, int, ???) stuff, for consistency we also need to consider a bunch of other additions like EndPoint.RawAddressFamily or Socket.RawAddressFamily. I'm skeptical if we can reach consensus and get the funding for such changes in 7.0.0. This is why I'm suggesting to evaluate minor changes which may be just enough to unblock users.

_rightEndPoint is the way Socket tracks the type of EndPoint it should use.

_rightEndPoint seems to be unused with connected Send/Receive. In SendTo the remoteEp parameter is mandatory, in ReceiveFrom, we may expect the user to pass remoteEP when used with arbitrary address families.

tmds commented 3 years ago

Then, for the normal stuff (IPv4 etc), the values are universal.

There is no standard that specifies these values are the same cross-platform.

API addition is a very deep rabbit hole

The problem that needs to be solved is distinguish raw values from defined enum values.

for consistency we also need to consider a bunch of other additions like EndPoint.RawAddressFamily or Socket.RawAddressFamily

These enums have an Unknown value which could be used instead of adding new APIs.

_rightEndPoint seems to be unused with connected Send/Receive. In SendTo the remoteEp parameter is mandatory, in ReceiveFrom, we may expect the user to pass remoteEP when used with arbitrary address families.

Yes, only specific operations require _rightEndPoint. The options are: a. not support them, b. allow the user to specify his own EndPoint type, or c. introduce a RawEndPoint type.

antonfirsov commented 3 years ago

These enums have an Unknown value which could be used instead of adding new APIs.

Sounds reasonable to me, but wondering if anyone would push back on that.

b. allow the user to specify his own EndPoint type

I think this is something we should be capable to support with the current API-s. (Connect(), SendTo(), ReceiveFrom() can be used to specify a custom EndPoint type.) The only API hole is seems to be around connected socket creation from a handle. We can add Socket(SafeSocketHandle handle, EndPoint remoteEp) to deal with this.

tmds commented 3 years ago

The only API hole is seems to be around connected socket creation from a handle. We can add Socket(SafeSocketHandle handle, EndPoint remoteEp) to deal with this.

If we don't add a constructor to Socket/SafeSocketHandle that accepts raw values, the user will need to P/Invoke.

const int AF_VSOCK = 40;
const int SOCK_STREAM = 1;
const int SOCK_CLOEXEC = 0x80000;

[DllImport("libc", SetLastError = true)]
public static extern int socket(int domain, int type, int protocol);

int fd = socket(AF_VSOCK, SOCK_STREAM | SOCK_CLOEXEC, 0); // TODO: handle fd == -1
using var socket = new Socket(new SafeSocketHandle((IntPtr)fd, true), new MyVSockEndPoint());
antonfirsov commented 3 years ago

If we don't add a constructor to Socket/SafeSocketHandle that accepts raw values, the user will need to P/Invoke.

Which might be acceptable for advanced scenarios. Again, I'm not super optimistic about our ability finalizing API proposals around the socket area, mostly because it has lower priority in comparison to other hot topics. (See our backlog of stale suggestions). I may give it a try though :)

for consistency we also need to consider a bunch of other additions like EndPoint.RawAddressFamily or Socket.RawAddressFamily

These enums have an Unknown value which could be used instead of adding new APIs.

This would mean that the raw address family and socket/protocol type is settable, but not queryable. This is quite atypical for BCL API-s, and likely not what we want there, meaning that we need to expose properties which is where the rabbit hole begins ...

Regarding the code sample in https://github.com/dotnet/runtime/issues/58327#issuecomment-911461137: I meant the Socket(SafeSocketHandle handle, EndPoint remoteEp) constructor to create Socket instances which are already connected using unmanaged connect or accept. Socket(handle) should be sufficient to create, bind and connect the socket via managed API-s, assigning _rightEndPoint when necessary. The only thing we need to do here is to find a way to relax the AddressFamily validation in the SocketAddress constructor.

karelz commented 3 years ago

Triage: Similar to https://github.com/dotnet/runtime/issues/58378#issuecomment-911847247 -- there may be dragons, let's find out more first.

antonfirsov commented 2 years ago

Triage: not critical for 7.0, pushing to future for now.

crozone commented 1 year ago

Ah yes interesting, I didn't realize SocketAddress could be fuzzed into changing the AF byte, this goes against both the docs and the code comments, so I did not expect this to be a supported scenario.

https://github.com/dotnet/runtime/blob/9671726c0a79d5f4cab320b32da1cd4463a0d1be/src/libraries/Common/src/System/Net/SocketAddress.cs#L55-L59

Then yes if that's supported that works, that removes the need for quite a bit of PInvoke, except the initial socket() call.

I noticed this myself when implementing a SocketCanEndPoint : EndPoint to allow binding to a SocketCAN socket on Linux. I didn't end up needing to set the first two bytes since AddressFamily.ControllerAreaNetwork is actually correctly translated into CAN_AF by TryConvertAddressFamilyPalToPlatform, but it is very weird that the comment explicitly says those bytes are read-only, and yet they're clearly not. Is it a bug?

wfurt commented 1 year ago

This may be officially relaxed as part of #30797. I don't if it would be useful but it would provide ability to send/receive without need for SocketCanEndPoint.

crozone commented 1 year ago

This may be officially relaxed as part of #30797. I don't if it would be useful but it would provide ability to send/receive without need for SocketCanEndPoint.

In the SocketCAN usecase, the SocketCanEndPoint is only used to bind the socket to the CAN interface with .Bind(), but not after that. This means that the performance impact isn't as serious as needing to pass it into every SendTo() or ReceiveFrom(), but there's still an annoying extra copy through the EndPoint indirection.

Is there any scope in that ticket for also making Bind() accept a sockaddr in the same way? Maybe even a BindRaw() that bypasses the SocketFamily Pal translation stuff, since the first two bytes are the address family.

SocketCAN code for reference: https://github.com/crozone/SocketCanNet/blob/main/SocketCanNet/CanSocketEndPoint.cs

https://github.com/crozone/SocketCanNet/blob/f33ff5ca0d4a72d463d5373884134002b678fdf1/SocketCanNet/SocketExtensions.cs#L98-L107

wfurt commented 1 year ago

there is no plan for Bind(SocketAddress) at the moment. That is simple function and p/invoke would be trivial IMHO. (unlike ReceiveFromAsync where there is lot of internal complexity and platform differences)