dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.05k stars 4.04k forks source link

Proposal: Better P/Invoke, and unsafe compiler semantics #2045

Closed prasannavl closed 7 years ago

prasannavl commented 9 years ago

P/Invoke is used everywhere these days, and any application that requires extensive customization or high-performance just has not much choice but to use them. Not using P/Invoke is no excuse in today's application's for system libraries in many aspects, when performance is critical, instead of hurdling through many managed abstractions (Provided you know what you're really doing), or for system libraries that .NET simply doesn't directly expose.

The Problem

While its extremely easy to do P/Invoke with C#, more often than not, it involves the following process:

  1. Find the libraries you want to use.
  2. Look for the P/Invoke signatures (online, dumpbin, and so on..)
  3. Find out all the dependent types.
  4. Look into the documentation of each of the parameters, figure out the correct marshaling details and more often than not, create entire structures, or const values, even if we are only going to use a few.
  5. Use the derived P/Invoke signatures, and hope that they are all correct. (Sites like PInvoke.net is great, but still prone to a lot of errors, and is also a lot of work, to simply say make a simple GetSystemMetrics call to value not exposed by .NET, say CX_PADDEDBORDER)

While you have to go through each of these steps for almost every single PInvoke, the obvious factor to note is a vast (amazingly vast) amount of system libraries are already there, with correct marshaling as internal classes in the BCL libraries itself. (Microsoft.Win32.Internal.Unsafe**). But the problem is these classes simply cannot be exposed directly, due to obvious reasons of safety. There is a reason they are internal in the first place, in the context of safety.

However, in the unsafe context, they are pretty much the same as the other classes. It would make no sense to hide them from the user, since the user already seems to indicate the intent to access internal libraries, and they are also wrapped into semantically coherent "Unsafe" classes. (A point to note that these internal classes are not internal because of unsupported implementations, but rather because they cannot guarantee safety in the managed environment if its execution is not controlled. - Most of them are all already exposed, and well supported system libraries.).

Now, considering all these already exist in the framework, its not just highly redundant to go through all this work again to use some simple system libraries, but also takes up significant amount of Metadata space (and hence binary sizes), and also redundant memory space, for something that's already used by the framework almost constantly. Another point to note is that, more than 50% of the P/Invoke developers do is into the system libraries, not just external PInvoke.

The Solution

Introduce a new public Attribute [ConstrainedPublic] on methods and classes. Classes decorated by these, are picked up by Roslyn, and will be considered public classes for compilation, and inspection, intellisense, etc. if the compiler is set to /unsafe, or a new /unconstrained switch. Without them, the semantics are exactly the same, and these libraries are not exposed publicly. This increases productivity multi-fold while P/Invoking into system libraries by reusing the existing classes, also providing correct signatures, making it less error prone for people doing the same. Not only that, you also shave a few kilobytes of binaries for programs with heavy P/Invoke, a small amount of memory space, and last but not the least, there's no longer a need to have to completely redundant PInvoke classes just to run a few direct native methods, which will be a single line in C/C++.

Interesting Prospects

Now the advantages of exposing system P/Invokes doesn't end there. Today, one has to wait for Microsoft, or someone outside to expose a managed class to access a new native feature from C#. Its often takes months for a publicly exposed managed class of new native features (and some just don't end up there at all for safety reasons). If the internal classes can be exposed under conditions where the developer explicitly intends to call down anyway, regardless of the safety, .NET-Native system libraries can be easily bridged and remain in sync, when the alternative requires all the procedure above stated.

Not only that, it also provides other native libraries to expose wrappers in a coherent manner, providing both direct access, and fully managed safe access.

mikedn commented 9 years ago

But the problem is these classes simply cannot be exposed directly, due to obvious reasons of safety. There is a reason they are internal in the first place, in the context of safety.

Actually the main reason why those classes aren't exposed is that they're implementation details. As such they can change at any time. This is exactly what happened in the case of .NET Core and .NET Native, those classes have been moved around quite a bit.

The safety bit isn't that important as untrusted code anyway needs a permission for PInvoke.

Now the advantages of exposing system P/Invokes doesn't end there. Today, one has to wait for Microsoft, or someone outside to expose a managed class to access a new native feature from C#.

How does you suggestion solves this? If we're talking about exposing a new native feature then the necessary PInvoke method won't probably exist in the available libraries to begin with, someone would have to add them.

HaloFour commented 9 years ago

One could argue that this is the purpose of C++/CLI, to offer the ability to directly bridge between managed and native code outside of exported functions or COM. I doubt that C# will see the degree of native support enjoyed by C++/CLI, particularly mixed (native and managed) assemblies.

svick commented 9 years ago

Wouldn't a lot of the difficulty with PInvoke be solved by having a library that contains all the signatures and structures?

Actually, it seems it already exists.

This doesn't solve the "significant amount of metadata" issue, but I'm not sure that's actually a problem in the first place.

HaloFour commented 9 years ago

If C#/.NET gets some form of static linking between assemblies then I could see the value in a Microsoft-sponsored interop assembly loaded with common P/Invoke signatures and structures. Until then you are left with a dependency on a potentially massive assembly.

I'd rather see more of a community effort to handle generating/refining the signatures and structures. P/Invoke.net has been around for quite some time and exists exactly for that purpose and they even have a Visual Studio extension.

paulomorgado commented 9 years ago

It would be nice if winmd jumped from RT to Win32.

prasannavl commented 9 years ago

@mikedn,

Actually the main reason why those classes aren't exposed is that they're implementation details. As such they can change at any time. This is exactly what happened in the case of .NET Core and .NET Native, those classes have been moved around quite a bit.

The safety bit isn't that important as untrusted code anyway needs a permission for PInvoke.

My bad. I failed to address that they are implementation details. My point is that, while currently they are used only as implementation details, a large amount of them are used quite regularly, and the work in doing them being duplicated over and over again.

In any normal scenario, I'd agree that implementation details never be exposed. But here, these implementation details are always a constant. i.e, You are never to rename an Api, or change its parameters, because they are all already well supported, well documented APIs that exist in the form of native libraries. And the reasoning behind not exposing the implementation details itself, is generally that they are subject to change, but here, they simply aren't.

The only exception could be that their namespaces and parent classes change - Which is actually is an implicit part of my proposal already. That the OS native libraries be standardized. Say, here under Microsoft.Win32. They perhaps could be under different assemblies, as used required by the runtime, so that a smaller runtime will not have all of them, while a full runtime such as the .NET Desktop runtime will have cover the entire native base.

The effort to do that, IMO is minimal, while the advantages are huge.

@HaloFour

If C#/.NET gets some form of static linking between assemblies then I could see the value in a Microsoft-sponsored interop assembly loaded with common P/Invoke signatures and structures. Until then you are left with a dependency on a potentially massive assembly.

I do agree with this. But C# is not all about just writing full fledged applications these days. Its beginning to take a significant chuck of scripting in Microsoft stack these days. And with CoreCLR, and Roslyn, and things built of top of it (like ScriptCS / LinqPad), I only see that increasing more. Frankly, I use Linq style queries, combined with Rx for a whole bunch of quick scripting these days. And P/Invokes is just a pain in the neck in these cases. All of these would be eased greatly by a very small amount of standardization.

mikedn commented 9 years ago

a large amount of them are used quite regularly

That is not my experience. And with the apparition of a open source and multiplatform version of .NET the use of PInvoke is likely to decrease.

But here, these implementation details are always a constant. i.e, You are never to rename an Api, or change its parameters, because they are all already well supported, well documented APIs that exist in the form of native libraries. And the reasoning behind not exposing the implementation details itself, is generally that they are subject to change, but here, they simply aren't.

The Win32 APIs do not change but the PInvoke can and do change. A trivial example is changing a HANDLE parameter type from IntPtr to SafeHandle.

The only exception could be that their namespaces and parent classes change - Which is actually is an implicit part of my proposal already. That the OS native libraries be standardized. Say, here under Microsoft.Win32.

That sounds more like a proposal for the CoreFX repository, it has nothing to do with Roslyn.

The effort to do that, IMO is minimal, while the advantages are huge.

Not really as some Win32 APIs are quite problematic to PInvoke. Consider DeviceIOControl for example. If I want to use it with a certain IO control code then I can get away with using ref, out and some structs for the input/output buffers. But if I want to make a PInvoke method that everyone can use then all I can do is use IntPtr or void* for those buffers and leave the marshaling responsibility to the user of the PInvoke.

GeirGrusom commented 9 years ago

I'd like to mention my library Platform.Invoke which also aids a little with platform invoke. Although it doesn't solve this problem exactly, it makes API handling a little bit more generic by allowing interface abstractions (instead of static methods) and invocation probing (for example for logging or error handling). For any public API's like Windows I think generators could handle these kinds of issue rather than the compiler or framework.

prasannavl commented 9 years ago

And with the apparition of a open source and multiplatform version of .NET the use of PInvoke is likely to decrease.

I'd like to hope so as well. But I doubt the change is going to be drastic, considering the practical use cases of P/Invokes. Its almost always in a scenario where its a OS specific fine-grained customization, or to skip a bunch of abstractions. I don't see this change a lot anytime in the near future.

The Win32 APIs do not change but the PInvoke can and do change. A trivial example is changing a HANDLE parameter type from IntPtr to SafeHandle.

Not really as some Win32 APIs are quite problematic to PInvoke. Consider DeviceIOControl for example. If I want to use it with a certain IO control code then I can get away with using ref, out and some structs for the input/output buffers. But if I want to make a PInvoke method that everyone can use then all I can do is use IntPtr or void* for those buffers and leave the marshaling responsibility to the user of the PInvoke.

Both these scenarios are handled quite the same:

The generic P/Invoke is what goes into the publicly exposed set, i.e, the IntPtr. And now the library that definitely wants to use refs, has two choices. One to use it directly as an implementation detail (thereby duplicating, as it currently does) if it really needs (but that's a scenario that doesn't add value to this whole prospect). The simple and valuable prospect is the next one, where the implementation detail of the library is to use the same IntPtr API, but internally marshal the IntPtr to the required types.

A simple solution to the IntPtr and SafeHandle scenario: Just have both, with SafeHandle as an overload instead of IntPtr!

A more complex scenario if an overload is questionable (Which I don't really see the need to, since the above addresses most scenarios):

Since Safe handles are just implementations sitting on top on IntPtr's anyway, I really think that's a non-issue since the least common factor IntPtr would be used, and simple abstractions to auto convert them to SafeHandle (as the CLR already does) can be handled by the library (or even as P/Invoke helpers for common scenarios) instead of the whole P/Invokes.

prasannavl commented 9 years ago

For any public API's like Windows I think generators could handle these kinds of issue rather than the compiler or framework.

I would generally agree, for sparsely used APIs. But these are all already existing APIs inside the framework libraries and used enormously internally. I see no value in regenerating and duplicating, if they can be easily reused.

This also brings about consistency in how they are used internally.

gafter commented 7 years ago

We are now taking language feature discussion on https://github.com/dotnet/csharplang for C# specific issues, https://github.com/dotnet/vblang for VB-specific features, and https://github.com/dotnet/csharplang for features that affect both languages.