[API Proposal]: New attribute for interop-specific struct concerns

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

14.54k stars 4.54k forks source link

[API Proposal]: New attribute for interop-specific struct concerns #100896

Open jkoritzinsky opened 2 months ago

jkoritzinsky commented 2 months ago

Background and motivation

.NET's struct layout system is quite extensive, but there are still some cases it does not cover that are relevant in interop scenarios at the language/runtime boundary.

In particular, we've seen requests for features like

explicit alignment requirements
.NET support for a mechanism like Rust's repr(transparent)

With Swift Interop, we have a need to represent generic structs with correct Swift layouts, but we don't have a way to represent the Swift layout accurately in the general case. In the non-general case, we can represent it with an explicit StructLayout.Size element, but we can't do that for generics.

For all of these cases, we'd like to extend the StructLayoutAttribute to handle these cases. However, we can't extend that attribute as it's a pseudo-attribute that maps directly to metadata. Instead, we propose adding a new attribute, only usable on struct types, that's intended to encompass all future layout features, as well as the existing layout features provided by StructLayoutAttribute.

As part of the implementation, we plan to introduce a simple source-generator to generate the corresponding StructLayoutAttribute on the type that has our new attribute applied. To ensure that we don't accidentally make types non-blittable, we'll lower the attribute such that CharSet = CharSet.Unicode in all cases.

In the future, we may extend this attribute to also be the "trigger" attribute for an interop source generator to generate a marshaller for a given struct type.

To limit the scope of this feature, we plan to start with only the feature required by Swift interop:

Provide a mechanism to not include trailing padding in the .NET struct's size for struct types.

Like the LayoutKind.Sequential support for structs in the runtime, we won't support these new layout requirements for any structs that (recursively) contain reference type fields.

API Proposal

namespace System.Runtime.InteropServices
{
    public enum LayoutKind
    {
        Custom
    }

    [AttributeUsage(AttributeTargets.Struct)]
    public sealed class CustomLayoutAttribute : Attribute
    {
        public CustomLayoutAttribute(CustomLayoutKind kind) {}

        // Only valid for CustomLayoutKind.SwiftEnum
        public int RequiredDiscriminatorBits { get; set; }
    }

    public enum CustomLayoutKind
    {
        Sequential, // C-style struct
        Union, // C-style union
        SwiftStruct, // Swift struct
        SwiftEnum // Swift enumeration
    }
}

namespace System.Runtime.InteropServices.Swift
{
    // Represents a pointer to a Swift object (for the purposes of calculating spare bits)
    public readonly unsafe struct SwiftObject
    {
        // Would be implemented to mask out the spare bits
        public void* Value { get; }
    }

    // Represents a bool value in Swift (for the purposes of calculating spare bits)
    public readonly unsafe struct SwiftBool
    {
        // Would be implemented to mask out the spare bits
        // or assign while preserving spare bits
        public bool Value { get; }
    }
}

API Usage

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftStruct)]
public struct InnerStruct
{
     public short F0; // offset 0
     public sbyte F1; // offset 2
}

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftStruct)]
public struct OuterStruct
{
    public ulong F0; // offset 0
    public long F1; // offset 8
    public InnerStruct F2; // offset 16
    public sbyte F3; // offset 19
}

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftEnum, RequiredDescriminatorBits = 1)]
public struct MyOptional<T> where T : unmanaged
{
    [StructLayout(LayoutKind.Custom)]
    [CustomLayout(CustomLayoutKind.SwiftStruct)]
    private struct Some_Payload
    {
          public T Value;
    }

    private Some_Payload Some;

    // An API surface to describe the different cases and provide a C# API around accessing them, out of the scope of this API proposal
}

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftEnum, RequiredDescriminatorBits = 1)]
public struct ParsedResult
{
     [StructLayout(LayoutKind.Custom)]
     [CustomLayout(CustomLayoutKind.SwiftStruct)]
     private struct ParsedObject_Payload
     {
            public SwiftPointer payload_0;
     }
     [StructLayout(LayoutKind.Custom)]
     [CustomLayout(CustomLayoutKind.SwiftStruct)]
     private struct ParsedBool_Payload
     {
            public SwiftBool payload_0;
     }

     private ParsedObject_Payload ParsedObject;
     private ParsedBool_Payload SwiftBool;
    // An API surface to describe the different cases and provide a C# API around accessing them, out of the scope of this API proposal
}

Alternative Designs

We could provide a set of layout primitives instead of a set of well-known layouts (the original proposal). However, we'd then have to handle all possible combinations of these primitives or block them to only the valid combinations to match our equivalent support in the current proposal.

We could go further than the original proposal and have a mechanism for specifying OS/Arch-specific layout and ABI parameter passing rules in attributes. This design would allow the runtime to be entirely out of the business of layout and ABI handling other than reading the attributes. This design has a few problems through: We'd need to consider how to provide/validate the provided options. The code-gen backends would need to respect this information (and reading from custom attributes is expensive). Generics would also be a problem in this design space.

Original API Proposal

### API Proposal ```csharp namespace System.Runtime.InteropServices; [AttributeUsage(AttributeTargets.Struct, AllowMultiple = false, Inherited = false)] public sealed class ImportedStructAttribute : Attribute { // StructLayoutAttribute matching members, excluding CharSet. public int Pack { get; set; } public int Size { get; set; } public ImportedStructAttribute(LayoutKind layoutKind); public LayoutKind LayoutKind { get; set; } // New member for Swift layout requirements public bool PadSizeToAlignment { get; set; } } ``` ### API Usage ```csharp [ImportedStruct(LayoutKind.Sequential, PadSizeToAlignment = false)] struct SwiftOptionalLike { T value; byte isNull; } Console.WriteLine(Unsafe.SizeOf>()); // Output: 5 ``` ### Alternative Designs We could provide a dedicated attribute for the Swift scenario. We could have Roslyn support lowering the StructLayoutAttribute-corresponding members to metadata instead of introducing a source generator. We could skip including the StructLayoutAttribute APIs and have the attributes represent separate concepts. We could add new members to StructLayoutAttribute and require all compilers to recognize when new members are specified and not remove the attribute (and still lower the original members to metadata). This option is very expensive.

Risks

No response

dotnet-policy-service[bot] commented 2 months ago

Tagging subscribers to this area: @dotnet/interop-contrib See info in area-owners.md if you want to be subscribed.

tannergooding commented 2 months ago

We could have Roslyn support lowering the StructLayoutAttribute-corresponding members to metadata instead of introducing a source generator.

I would expect some Roslyn integration to be a baseline requirement. The compiler already special cases StructLayout, FieldOffset, and several of the other CompilerServices/InteropServices attributes including when specifying them is legal or not.

If we are going to have this work as pay for play where we only go look at the extended attributes if a particular LayoutKind is set, then FieldOffset would be blocked from usage since the layout kind isn't Explicit

At that point, I think it's worth just minimally integrating it like we did for extended CallConv so that the compiler understands this is the new thing, that FieldOffset is allowed with the new thing, and that there is a general extensibility model for representing this data.

Theoretically we could just have something like LayoutKind* types, much as we have CallConv* types. You could then have something like LayoutKindSwift or LayoutKindTransparent or LayoutKindSequential and allow them to be combined in interesting ways that the runtime can opt to support. The root attribute could then have the baseline fields like Pack and Size that every type has. -- Just as a hypothetical that loosely correlates to how we've done other things, this isn't necessarily a good idea or the right direction for type layout.

public sealed class ImportedStructAttribute : Attribute

I'm not a huge fan of this name and don't think it's necessarily obvious what it means.

Could we define it as UnmanagedStructLayoutAttribute or InteropStructLayoutAttribute or something along those lines instead, perhaps? Make it clear its only works with "unmanaged" types, is primarily for use in interop scenarios, is defining the layout of the type, etc.

AaronRobinsonMSFT commented 2 months ago

I would expect some Roslyn integration to be a baseline requirement.

I don't think that is a requirement. We made the transition to LibraryImport without any deep Roslyn integration. If however you are simply referring to an Analyzer/Fixer/Source generator that is something I could understand. I'm still not sure it is needed, but it depends on how broadly we expect the attribute to be used in v1.

Theoretically we could just have something like LayoutKind types, much as we have CallConv types.

This is something we should consider. I like following the CallConv approach we've developed.

I'm not a huge fan of this name and don't think it's necessarily obvious what it means.

It follows the same pattern as LibraryImport, DllImport, JSImport, ComImport - it is importing the definition of something from another source.

Could we define it as UnmanagedStructLayoutAttribute or InteropStructLayoutAttribute or something along those lines instead, perhaps?

These are not appropriate in this case since the values here will be reflected in both the managed type layout itself and not just at the interop boundary. We did consider UnmanagedStructLayoutAttribute but since it impacts the layout done by the runtime, it didn't seem appropriate.

Make it clear its only works with "unmanaged" types,

I would prefer to avoid C# concepts in this definition, especially as it relates to types - UnmanagedCallersOnly isn't about types so that works for me. My preference would be using "ValueType" rather than "Unmanaged". I think the "Import" term is something which is consistent with our other scenarios and should be used to align with the pattern.

jkotas commented 2 months ago

There is an unused value in LayoutKind enum. We can use this unused LayoutKind enum value to mean "look at the new attribute for the layout spec". It is similar to the approach we have used for CallConv where we have mapped an unused value to mean "look at the modops". (It would require a Roslyn change to relax the validation of LayoutKind, but that should not be a big deal. As Tanner pointed out, we may want to make Roslyn aware of this anyway.)

With Swift Interop, we have a need to represent generic structs with correct Swift layouts

There are number of other issues with generics support in Swift interop. Unless we have a solution for how to make generics work well with Swift interop end-to-end, solving the struct layout issue is not that interesting.

Size

I think it is a questionable property. As far as I know, there is no equivalent in C/C++.

Also, would it make more sense to allow specifying alignment, to better match how one can control layout in C/C++?

PadSizeToAlignment

Is this all that is required to express layout of all possible Swift structs? Is there a prior art for expressing the Swift layout struct rules in clang/C++? (I am not sure whether I like inventing names like this for language-specific layout algorithms.)

that FieldOffset is allowed with the new thing

Why would we want to allow FieldOffset with the new thing? FieldOffset is problematic concept (except when used with offset 0) - I believe that you made this point earlier.

we could just have something like LayoutKind types, much as we have CallConv types

The CallConv... types exist to allow encoding the calling convention in function pointers. It is not possible to use attributes for function pointers. I think attribute works just fine for encoding type layout.

jkotas commented 2 months ago

We made the transition to LibraryImport without any deep Roslyn integration.

Nit: I agree that LibraryImport does not have deep integration with Roslyn. However, we have made changes in Roslyn to make it work well (e.g. relaxed rules for partial).

We did consider UnmanagedStructLayoutAttribute but since it impacts the layout done by the runtime, it didn't seem appropriate.

I would expect this to be useful for controlling struct layout in general, not limited to structs imported from other languages. My name choice would be UnmanagedLayoutAttribute to make it as general as possible ("unmanaged" in this context means "not auto-managed by the runtime"). If you take my other suggestions to reuse unused LayoutKind value, the attribute can pair with LayoutKind.Unmanaged.

AaronRobinsonMSFT commented 2 months ago

I would expect this to be useful for controlling struct layout in general, not limited to structs imported from other languages.

I fully agree, that is why UnmanagedLayoutAttribute seems inappropriate. There is no need to imply any association with the C# "unmanaged" concept or the "unmanaged" term in UnmanagedCallersOnly. Instead I think indicating what is being customized, a "ValueType", is the ideal term. The "Import" term was used since we assumed the majority of this customization would be in service of defining types from other systems. Your remarks about a "general purpose" mechanism is compelling so calling this something like ValueTypeLayout seems the most appropriate.

jkotas commented 2 months ago

ValueTypeLayoutAttribute looks too similar to the existing StructLayoutAttribute. We may want something more distinct, and we may want to avoid coupling the name to value types in case we need to use it for more than value types in future (StructLayoutAttribute made this mistake - it can be used for both structs and non-structs).

Maybe CustomTypeLayoutAttribute? It would pair with LayoutKind.Custom. We have several existing CustomSomethingAttributes in the BCL.

hamarb123 commented 2 months ago

Maybe CustomTypeLayoutAttribute? It would pair with LayoutKind.Custom. We have several existing CustomSomethingAttributes in the BCL.

I think this name is pretty good. I think avoiding unmanaged in the name is a good idea - I thought we would make it not work with managed structs when the name was suggested because that's what it implies.

Also, would it make more sense to allow specifying alignment, to better match how one can control layout in C/C++?

I think it would be great if we could specify an alignment. It would be very useful.

[ImportedStruct(LayoutKind.Sequential, PadSizeToAlignment = false)]
struct SwiftOptionalLike<T>
{
    T value;
    byte isNull;
}

Console.WriteLine(Unsafe.SizeOf<SwiftOptionalLike<int>>()); // Output: 5

What happens if I do this:

SwiftOptionalLike<int>[] array = new SwiftOptionalLike<int>[5];
Span<SwiftOptionalLike<int>> span = array;
span[1].Value = 5; //is it misaligned? (and do we care or not?) or is sizeof is wrong and cannot be used anymore for this? or do we just disallow this entirely somehow?

Why would we want to allow FieldOffset with the new thing? FieldOffset is problematic concept (except when used with offset 0) - I believe that you made this point earlier.

I'd tend to agree that allowing FieldOffset it probably not a good idea, unless we have a good reason to combine explicit layout with the additional features this new system may provide.

Here's 2 useful layouts I'd like to see that we could potentially make with this feature:

transparent layout
union layout (something along the lines of: first field is identifier, and the rest are laid out overlapped by the runtime automatically to minimise the size whilst keeping managed & unmanaged sections separate - I described such an idea here also in more detail)

It would be good if we also kept sequential & auto with this feature so they could benefit from new options like alignment.

SingleAccretion commented 2 months ago

alignment control

I will note that the runtime doesn't support alignments larger than 8 bytes (for on-GC-heap objects).

hamarb123 commented 2 months ago

I will note that the runtime doesn't support alignments larger than 8 bytes (for on-GC-heap objects).

Indeed, that part, if approved, would presumably just be delayed until the runtime could support it (since I assume there's no fundamental reason why it couldn't work).

[ImportedStruct(LayoutKind.Sequential, PadSizeToAlignment = false)]
struct SwiftOptionalLike<T>
{
    T value;
    byte isNull;
}

Console.WriteLine(Unsafe.SizeOf<SwiftOptionalLike<int>>()); // Output: 5

Also, this could presumably(?) break some of use who have AlignOf helper functions similarly to:

struct AlignHelper<T>
{
    T value;
    byte field;
}

static unsafe int AlignOf<T>() => sizeof(AlignHelper<T>) - sizeof(T);

Console.WriteLine(AlignOf<SwiftOptionalLike<int>>()); //would this produce what we expect for alignment?
//it relates to the question I asked about spans earlier

If it would break the above, it would be great to get proper AlignOf APIs (ideally both generic in Unsafe, and on RuntimeHelpers like with the new SizeOf API) so we can detect it legitimately.

AaronRobinsonMSFT commented 2 months ago

we may want to avoid coupling the name to value types in case we need to use it for more than value types in future

I thought about this when Jeremy first suggested the idea - "Would we ever want this for reference types?". I don't think I have the imgination to come up with a scenario where we would want to play layout games with reference types. Not pushing back on the suggestion, but is there an obvious case where it would be compelling? I personally find the StructLayoutAttribute being applicable to reference types to be a huge mistake and has only created complexity and almost no practical upside.

Maybe CustomTypeLayoutAttribute? It would pair with LayoutKind.Custom. We have several existing CustomSomethingAttributes in the BCL.

I have no push back with that name. The LayoutKind.Custom also seems natural with that name and the intent.

jakobbotsch commented 2 months ago

FWIW, the Swift struct layout algorithm is described here: https://github.com/apple/swift/blob/4b440a1d80a0900b6121b6e4a15fff2a96263bc5/docs/ABI/TypeLayout.rst#fragile-struct-and-tuple-layout

I'm not sure that PadSizeToAlignment alone ends up being sufficient for the Swift scenarios. For example, will creating an array in .NET of a Swift struct whose size doesn't match its stride (as defined by the above) end up doing the right thing?

tannergooding commented 2 months ago

Why would we want to allow FieldOffset with the new thing? FieldOffset is problematic concept (except when used with offset 0) - I believe that you made this point earlier.

My understanding of the proposal (based on the wording and the shape of the new attribute) was that this was basically meant to supplement the existing StructLayout attribute and in particular replace it when it was present.

My assumption had then been that this meant that something like a union in Swift would be defined something like: [ImportedStruct(LayoutKind = LayoutKind.Explicit, PadSizeToAlignment = false)] and therefore would require FieldOffset to work.

I had also assumed we were going to want this to be "pay for play" so that the VM doesn't have to look for this attribute on every struct, and the only sensible way to do that is to use one of the free bits in the existing LayoutKind enum field used by StructLayout; which would of course force it to be mutually exclusive and therefore impact the special handling Roslyn already has around these pseudo- attributes.

I do think that FieldOffset itself is a very poor mechanism and doesn't itself "properly" match to anything C/C++ exposes unless you exclusively use it as FieldOffset(0).

If we were to design this from the ground up, then what I would expect to see is probably:

We make it pay-for-play by marking the type with [StructLayout(LayoutKind.Extended)]
Extended (or a different/better name) tells the VM to also look for and resolve the new attribute (will call it ExtendedStructLayout for simplicity below)
ExtendedStructLayout then contains the relevant new fields that allow controlling newer layout mechanisms

I would then expect us to take a look at what metadata is actually expressible by languages (both officially in the language spec and unofficially via documented compiler switches/features) we want to interoperate with and make a determination on what to expose based on that.

I would then not want to reuse the existing LayoutKind definitions in the extended metadata. Rather, I'd want to see:

Sequential - Works just like LayoutKind.Sequential
Union - Works just like LayoutKind.Explicit with every field at 0
... - Future expansions based on need the runtime identifies and believes worth exposing
- Theoretical examples include layouts such as Transparent or other callouts made above

I would want us to be more explicit about the differences between Pack and Align. If I were to try to describe a difference based on C/C++, then you have the natural alignment of the type as defined by the ABI (which is effectively the defined alignment for primitives and the maximum alignment of all fields otherwise). alignas then allows you to override the natural alignment of a type to a greater alignment (never a lesser, lesser alignment is ill-formed). #pragma pack then allows you to override the alignment of fields, without changing the actual alignment of a the type for a field. This does impact the natural alignment computed for the type containing those fields, however, as it can lower the alignment for a given field. Just as alignas can never specify a lower than natural alignment, pack can never specify a greater than natural packing.

I think we would then need to consider the best way to represent concepts like compatibility with a particular language. My biggest concern with something like PadSizeToAlignment would be that it's not necessarily intuitive to someone who knows they're needing Swift interop. There are several repeated concepts across the various targets, but in general a lot of things tend to be very language centered/oriented and could break or change over time. So an alternative thought might be that we have some TargetLanguage field that includes C, Swift, and Java that makes it very clear "we have a struct/union that will be getting passed to {TargetLanguage} so ensure the padding/layout is computed using the language specific rules" and thus not require users to understand the large nuance (which might even differ based now or in the future on things like machine ABI, operating system, or CPU architecture).

jkoritzinsky commented 2 months ago

FWIW, the Swift struct layout algorithm is described here: apple/swift@4b440a1/docs/ABI/TypeLayout.rst#fragile-struct-and-tuple-layout

I'm not sure that PadSizeToAlignment alone ends up being sufficient for the Swift scenarios. For example, will creating an array in .NET of a Swift struct whose size doesn't match its stride (as defined by the above) end up doing the right thing?

I've spent some time looking into more specifics of Swift layout (primarily around enums and the discriminator). Swift uses a concept of "spare bits" based on the type and platform to reuse bits in types (thankfully only in the non-generic case). We could have the Swift projection handle this at the projection layer (with explicit layout with specifically placed [FieldOffset(X)] byte b fields) and get correct behavior for all 64-bit targets for enums with up to 128 cases (after that, the number of spare bits in some cases differs per-platform, so some platforms would append an additional byte to the layout and others wouldn't).

Alternatively, we could add runtime support for this concept and add more intrinsic Swift struct types to represent the different categories of types (in particular, pointers to Swift objects and ObjC objects bridged into Swift) along with a mechanism to say how many spare bits to reserve (padding the struct length with additional bytes if necessary).

I do like @tannergooding's proposal of specifying a specific target language and letting the runtime handle it, but I'm concerned that the cost of implementing more complicated concepts like the spare-bits concept would be too expensive (especially if we won't hit the limitations in the .NET 9 targets for Swift interop).

Based on the reactions to the TargetLanguage proposal, I'll start putting together a proposal for it.

jkoritzinsky commented 2 months ago

Here's my first pass at an API based on Tanner's ideas (and handling spare bits in the VM/layout).

The general idea is to encode the information in the attribute by using different "*LayoutKind" enums for different target languages.

I've also included the extra requirements for Swift. I've decided to not include a type for the ObjC bridged object as we're trying to avoid handling that case in the Swift interop the .NET 9 timeframe.

namespace System.Runtime.InteropServices
{
    public enum LayoutKind
    {
        Custom
    }

    [AttributeUsage(AttributeTargets.Struct)]
    public sealed class CustomLayoutAttribute : Attribute
    {
        public CustomLayoutAttribute(CustomLayoutKind kind) {}
        public CustomLayoutAttribute(CLayoutKind kind) {}
        public CustomLayoutAttribute(Swift.SwiftLayoutKind kind, int requiredSpareBits = 0) {}
    }

    public enum CustomLayoutKind
    {
        Sequential,
        Transparent
    }

    public enum CLayoutKind
    {
        Struct,
        Union
    }
}

namespace System.Runtime.InteropServices.Swift
{
    public enum SwiftLayoutKind
    {
        Struct,
        Enum
    }
    public struct SpareBits
    {
        public static SpareBits GetSpareBits<T>(T value) where T: unmanaged;
        public static T SetSpareBits<T>(T value, SpareBits spareBits) where T: unmanaged;

        // Returns the value of the spare bits as an unsigned 64-bit integer.
        public ulong AsUInt64();

        // Sets the value of the spare bits
        public void Set(ulong bits);
    }

    // Represents a pointer to a Swift object (for the purposes of calculating spare bits)
    public readonly unsafe struct SwiftObject
    {
        public SwiftObject(void* value)
        {
            Value = value;
        }
        public void* Value { get; }
    }
}

jkotas commented 2 months ago

I do not understand the SpareBits. I assumed that the Swift-specific layout computation would take care of the spare bit allocation transparently. Could you please shed some more light on it?

Can the CustomLayoutKind be one enum that covers all cases?

enum CustomLayoutKind
{
    Sequential, // C-like struct
    Union, // C-like union
    Swift, // Swift-specific layout rules
    Transparent, // Rust-like transparent
    ...
}

I have mixed feeling about transparent. As I have said before, it just pushes work that can be done by interop binding generators into the runtime.

hamarb123 commented 2 months ago

Union, // C-like union

What sort of a union is this? Is it an order dependent one or an order independent one? See this (which I linked before, but nobody seemed to look at) where I go over at an overview level what they're both useful for, and how they could be specifically represented in metadata.

i.e., what do we expect the answer to the following question to be?

[CustomLayout(CustomLayoutKind.Union)]
struct Union1<T1, T2>
{
    T1 field1;
    T2 field2;
}

Is the layout of Union1<T1, T2> the same as Union1<T2, T1> always? (useful for A|B|C DUs, bad for Option<T> DUs) Is the type Union1<T1, T2> exactly the same type as Union1<T2, T1> always? (useful for A|B|C DUs, bad for Option<T> DUs) I think both of these types of DUs are useful, but they would need different encodings most likely to be most optimal as I pointed out earlier. The one that allows swapping T1 & T2 could be called something like InterchangeableUnion to disambiguate it. Are we going to provide just one of these (and if so, which one), or are we going to provide both of them?

hamarb123 commented 2 months ago

        public CustomLayoutAttribute(CustomLayoutKind kind) {}
        public CustomLayoutAttribute(CLayoutKind kind) {}
        public CustomLayoutAttribute(Swift.SwiftLayoutKind kind, int requiredSpareBits = 0) {}

I don't think this is the right approach, I think there should be 1 constructor that takes a CustomLayoutKind, and then specify any additional information on properties.

jkotas commented 2 months ago

What sort of a union is this?

It is C-like union. It would work exactly same as union in C. It has nothing to do with C# DUs. I expect that it would be equivalent to what you can get today from Layout.Explicit and specifying FieldOffset(0) on all fields.

jkoritzinsky commented 2 months ago

I do not understand the SpareBits. I assumed that the Swift-specific layout computation would take care of the spare bit allocation transparently. Could you please shed some more light on it?

The SpareBits type would provide a way to get the value stored in the spare bits of a value type to enable a Swift projection to know which element of the enum is active.

It looks like the Swift compiler puts an entry into an enum type's value witness table to find the tag, so the projection can use that and not need to read them manually. I'll remove the type. https://godbolt.org/z/KMnaoTePd

Can the CustomLayoutKind be one enum that covers all cases?
enum CustomLayoutKind
{
    Sequential, // C-like struct
    Union, // C-like union
    Swift, // Swift-specific layout rules
    Transparent, // Rust-like transparent
    ...
}
I have mixed feeling about transparent. As I have said before, it just pushes work that can be done by interop binding generators into the runtime.

I was mainly using Transparent as an example of another "custom layout" that isn't interop-language-specific.

We could use one joint enum and only read the members necessary. I was trying to use separate constructors to make it not possible to specify information that's not applicable to the target layout, but I'm not tied to the idea.

Here's an updated proposal:

namespace System.Runtime.InteropServices
{
    public enum LayoutKind
    {
        Custom
    }

    [AttributeUsage(AttributeTargets.Struct)]
    public sealed class CustomLayoutAttribute : Attribute
    {
        public CustomLayoutAttribute(CustomLayoutKind kind) {}

        // Only valid for CustomLayoutKind.SwiftEnum
        public int RequiredDiscriminatorBits { get; set; }
    }

    public enum CustomLayoutKind
    {
        Sequential, // C-style struct
        Union, // C-style union
        SwiftStruct, // Swift struct
        SwiftEnum // Swift enumeration
    }
}

namespace System.Runtime.InteropServices.Swift
{
    // Represents a pointer to a Swift object (for the purposes of calculating spare bits)
    public readonly unsafe struct SwiftObject
    {
        // Would be implemented to mask out the spare bits
        // or assign while preserving spare bits
        public void* Value { get; set; }
    }

    // Represents a bool value in Swift (for the purposes of calculating spare bits)
    public readonly unsafe struct SwiftBool
    {
        // Would be implemented to mask out the spare bits
        // or assign while preserving spare bits
        public bool Value { get; }
    }
}

tannergooding commented 2 months ago

I have mixed feeling about transparent. As I have said before, it just pushes work that can be done by interop binding generators into the runtime.

A sufficiently smart tool can try to get it all right, but its very error prone and prone to breaking if a new platform comes online.

The general issue is that it it starts getting into concepts that are ABI specific. That is, whether or not T and a simple struct wrapper S (struct S { T value; }) are equivalent depends on a lot of ABI specific context, such as the target operating system, the target CPU, whether its a return or parameter, whether its nested as a field of another struct, etc.

A simple example is that people worked around the well known Windows member call difference for x64 for the longest time where a C signature that looked like SomeStruct M() would be fixed up to look like SomeStruct* M(SomeStruct* retBuffer). However, this fix can subtly break for some struct layouts on Arm64 and people started hitting it when trying to run the same code on Windows Arm64. This was largely resolved with the newer CallConvMemberFunction but that still leaves some problems and nuance in other scenarios.

A built-in layout like Transparent is something the runtime already has, however. It uses this for types like NFloat, CLong, and CULong. It will likely also be used by other special interop types the runtime needs to define in the future. We similarly already have validation to assert certain types of structs only have 1 field, as its used by the InlineArray feature. So getting the runtime to support this a little more broadly so that cases such as struct HWND { void* _value; } or struct HRESULT { int _value;l } can work as expected shouldn't be that much more complex and will greatly simplify the overall user experience, JIT overhead, and tooling complexity required.

jkotas commented 2 months ago

Here's an updated proposal:

Could you please share a few examples of Swift structs and enums and what their C# equivalents would be to using these constructs?

jkotas commented 2 months ago

its very error prone and prone to breaking if a new platform comes online.

It is only error prone and non-portable if people are cutting corners. It is not error prone if the types are matched between managed and unmanaged signatures exactly. If the unmanaged signature has int32_t, the managed signature should have int, anything else is non-portable.

The design principle that we have established for runtime interop going forward has been to only introduce low-level features that are impossible or very hard to do in higher level bindings. Wrapping primitive types with structs is boiler plate code that is very straightforward to do in higher level bindings. One can come up with number of similar features that require quite a bit of boiler place code in interop bindings today, but that can be implemented by the runtime instead. For example, we can allow byref types in signatures and pin them implicitly in the JIT. It would save a good amount of boiler plate-code in interop bindings too. If we start introducing these types of features, where should we stop? Are we going to end up with a complicated built-in interop v2?

This was largely resolved with the newer CallConvMemberFunction but that still leaves some problems and nuance in other scenarios.

Right, we have introduced CallConvMemberFunction since it was impossible to deal with these calling convention differences in the interop bindings in a portable way. (I am not sure what the other scenarios you have in mind are.)

jkoritzinsky commented 2 months ago

Here's an updated proposal:

Could you please share a few examples of Swift structs and enums and what their C# equivalents would be to using these constructs?

Here's some examples:

@frozen
public struct InnerStruct
{
    let F0: Int16;
    let F1: Int8;
}

@frozen
public struct OuterStruct
{
    let F0: UInt64;
    let F1: Int64:
    let F2 : InnerStruct;
    let F3: Int8;
}

@frozen
public enum MyOptional<T>
{
    case Empty;
    case Some(Value: T);
}

public class MyClass
{
}

@frozen
public enum ParsedResult
{
    case ParsedObject(MyClass);
    case ParsedBool(Bool);
}

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftStruct)]
public struct InnerStruct
{
     public short F0; // offset 0
     public sbyte F1; // offset 2
}

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftStruct)]
public struct OuterStruct
{
    public ulong F0; // offset 0
    public long F1; // offset 8
    public InnerStruct F2; // offset 16
    public sbyte F3; // offset 19
}

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftEnum, RequiredDescriminatorBits = 1)]
public struct MyOptional<T> where T : unmanaged
{
    [StructLayout(LayoutKind.Custom)]
    [CustomLayout(CustomLayoutKind.SwiftStruct)]
    private struct Some_Payload
    {
          public T Value;
    }

    private Some_Payload Some;

    // An API surface to describe the different cases and provide a C# API around accessing them, out of the scope of this API proposal
}

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftEnum, RequiredDescriminatorBits = 1)]
public struct ParsedResult
{
     [StructLayout(LayoutKind.Custom)]
     [CustomLayout(CustomLayoutKind.SwiftStruct)]
     private struct ParsedObject_Payload
     {
            public SwiftPointer payload_0;
     }
     [StructLayout(LayoutKind.Custom)]
     [CustomLayout(CustomLayoutKind.SwiftStruct)]
     private struct ParsedBool_Payload
     {
            public SwiftBool payload_0;
     }

     private ParsedObject_Payload ParsedObject;
     private ParsedBool_Payload SwiftBool;
    // An API surface to describe the different cases and provide a C# API around accessing them, out of the scope of this API proposal
}

jkotas commented 2 months ago

Can the RequiredDescriminatorBits be combined with bool fields - what would the following Swift enum look like in C# and what would be its size in bytes?

@frozen
public enum MyOptional
{
    case Empty;
    case OtherEmpty;
    case Some(Bool);
}

jkoritzinsky commented 2 months ago

With the above proposal, that type would be projected as follows (fixed the spelling for discriminator in the proposal as well):

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftEnum, RequiredDiscriminatorBits = 2)]
public struct MyOptional
{
     [StructLayout(LayoutKind.Custom)]
     [CustomLayout(CustomLayoutKind.SwiftStruct)]
     private struct Some_Payload
     {
            public SwiftBool payload_0;
     }

     private Some_Payload Some;
}

The type would be 1 byte size and the bitwise layout would be as follows:

discriminator: 0 - 1
empty: 2-6
Some.payload_0: 7

I recommend we use the SwiftBool type here to ensure that any usage as the C# bool type has the other bits masked off correctly. I know that we have had optimizations implemented around bool values only ever being 0 or 1, so by using a separate type with conversions, we can ensure that we don't interfere with those optimizations.

jkotas commented 2 months ago

The bit fields come with many new problems. For example, you cannot take address of MyOptional.Some field when it is a bit field. That breaks a lot of things in the runtime, the JIT and maybe even Roslyn.

jkoritzinsky commented 2 months ago

I think there's a little confusion here. Some_Payload is a 1-byte type, as is SwiftBool. However, the discriminator bits are stored within the SwiftBool's storage as they're known to be unused. You can still take the address of Some. The SwiftBool type provides an API abstraction to get the bool value with masking off the bits that may be used by a discriminator. This is equivalent to what Swift allows for this sort of layout.

jkotas commented 2 months ago

provides an API abstraction to get the bool value with masking off the bits that may be used by a discriminator

Is the JIT expected to generate the masking? Are there going to be special rules for how SwiftBool can and cannot be used? For example, consider:

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftStruct)]
struct MyStruct
{
    SwiftBool f1;
    SwiftBool f2;
}

MyStruct s;

SwiftBool a = s.f2;
Console.WriteLine(a.Value);

jkoritzinsky commented 2 months ago

provides an API abstraction to get the bool value with masking off the bits that may be used by a discriminator

Is the JIT expected to generate the masking?

No, the masking can be implemented in SwiftBool itself.

Are there going to be special rules for how SwiftBool can and cannot be used? For example, consider:
[StructLayout(LayoutKind.Custom)]

[CustomLayout(CustomLayoutKind.SwiftStruct)]

struct MyStruct

{

    SwiftBool f1;

    SwiftBool f2;

}

MyStruct s;

SwiftBool a = s.f2;

Console.WriteLine(a.Value);

Swift's Bool type always takes up at least 1 byte. So the size of MyStruct is 2 bytes. There's no extra restrictions on its usage. Swift just allows the enumeration layout algorithm to reuse the bits that the Bool type definitely doesn't use.

To do this, Swift allows types in the standard library to use LLVM's custom-bit-sized integer types. These types are always allocated as the next legal integer size (ie the i1 type is allocated as through it was an i8 and an i19 would be allocated as an i32), but the Swift compiler is able to identify these types and recognize that the extra bits are unused and can be re-utilized for the discriminator. In our case, Bool is defined with an i1 field (so it takes up an i8 of space and Swift knows that 7 of the bits are unused).

hamarb123 commented 2 months ago

I'm still confused what we expect sizeof to do

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftStruct)]
public struct OuterStruct
{
    public ulong F0; // offset 0
    public long F1; // offset 8
    public InnerStruct F2; // offset 16
    public sbyte F3; // offset 19
}

Based on the above, it seems like sizeof InnerStruct should be 3, but if we had

[StructLayout(LayoutKind.Custom)]
[CustomLayout(CustomLayoutKind.SwiftStruct)]
public struct OuterStruct2
{
    public InnerStruct F0; // offset 0
    public ushort F1; // offset 4
}

It seems like the size should be 4.

How do we get these different values? I'd think if we're copying an arbitrary InnerStruct we'd only want to copy 3 bytes, but if we're indexing in a span we'd want to jump 4 at a time, and potentially in other scenarios we'd want to jump an amount based on a different alignment...

Maybe something like

//Unsafe (& RuntimeHelpers overloads for RTH?)
static int SizeOf<T>(int nextAlignment);
static int AlignOf<T>();

//copy size
Unsafe.SizeOf<InnerStruct>(1); //3

//offset to next in span
Unsafe.SizeOf<InnerStruct>(Unsafe.AlignOf<InnerStruct>()); //4

//offset to next byte field
Unsafe.SizeOf<InnerStruct>(Unsafe.AlignOf<byte>()); //3

//offset to next short field
Unsafe.SizeOf<InnerStruct>(Unsafe.AlignOf<short>()); //4

Either way we set sizeof to work, it seems like we'd definitely break existing unsafe code (but not for any existing types luckily).

jkotas commented 2 months ago

Swift's Bool type always takes up at least 1 byte. So the size of MyStruct is 2 bytes.

Ok, I have incorrectly assumed that the bit packing works for structs too. It sounds like that it works for Swift enums only and only when there is a single bool value. Just curious - is there a reason why this specific case is optimized? Are enums like this very common in Swift APIs?

No, the masking can be implemented in SwiftBool itself.

So we are going to always return the lowest bit bool Value => (_value & 1) != 0;. I guess that works for the singular enum case.

lambdageek commented 2 months ago

I strongly disagree with having language-specific custom layouts as a runtime concept. I think this a great UX for a source generator, but the underlying mechanism should be language agnostic (to whatever extent possible - if a language, like Swift, requires an ABI different from the normal platform ABI, that has to bleed into the runtime).

Having language-specific custom layouts as a runtime concept will lead to:

Users of new languages using a "close enough" existing language layout that is an imperfect match. Which means users interested in interop with a new unsupported language having to understand both their language and the "close enough" language.
Binary compat breaks when the runtime picks up support for the new language and presumably every interop library has to upgrade from using the "close enough" language attribute to using the new language attribute.

Also I'm not sure that language-specific runtime mechanisms are a great idea because languages evolve. Generally they're not breaking their ABI on every major version, but it's conceivable that some Rust edition or new version of Swift breaks compat with a previous one and now we're stuck with an enum value that is ambiguous or unusable.

tannergooding commented 2 months ago

@jkoritzinsky, After thinking about this more, I'm actually not quite sure I 100% understand the reason behind Swift, // Swift-specific layout rules

In general, ABIs are determined roughly in terms of the underlying system ABI, which is itself largely oriented around C. Accordingly, if a type or method cannot be defined using "standard" C, then it is not possible for C to call and therefore not really possible for arbitrary interop -- Where "standard" C is a little bit looser term than "spec-compliant" C and really just means, definable by GCC/Clang/MSVC

Based on some of the above, it sounds like we're trying to circumvent the core interop APIs for Swift and trying to interact with it directly, rather than going through the C compatible ABI defined for the language and so its kind-of like if some C API tried to call into a .NET generic function directly. Yes you can do it, but its technically depending on implementation details that are subject to change and is largely undefined behavior. If C wants to call a .NET generic API, then it needs to use the appropriate hooks to resolve an ABI stable wrapper instead.

It'd be great if this general scenario could be clarified and why we need such a feature but C does not.

jkoritzinsky commented 2 months ago

Swift's Bool type always takes up at least 1 byte. So the size of MyStruct is 2 bytes.

Ok, I have incorrectly assumed that the bit packing works for structs too. It sounds like that it works for Swift enums only and only when there is a single bool value.

This works for enums with Bool members, Swift class object members, or Objective-C bridged object members.

Just curious - is there a reason why this specific case is optimized?

There used to be more cases in Swift that were optimized in this way. The UnicodeScalar type used to be defined as Builtin.Int21, so the top 11 bits for it could be optimized the same way. After searching the public Standard Library, Bool is the only one today.

Are enums like this very common in Swift APIs?

Enums with Bools aren't particularly common, but ones with class types (represented by the SwiftPointer type) are.

No, the masking can be implemented in SwiftBool itself.

So we are going to always return the lowest bit bool Value => (_value & 1) != 0;. I guess that works for the singular enum case.

Yep

I strongly disagree with having language-specific custom layouts as a runtime concept. I think this a great UX for a source generator, but the underlying mechanism should be language agnostic (to whatever extent possible - if a language, like Swift, requires an ABI different from the normal platform ABI, that has to bleed into the runtime).

We plan to do most work a source generator/projection space, but the ABI platform differences that must be accounted for in the calling convention must be representable to the runtime, as the runtime handles the register allocation, lowering, etc. Basic type layout is included here.

We tried to use the existing .NET features to describe Swift layouts, but we've realized that we can't do so with existing features.

We tried to use StructLayoutAttribute.Size to trim trailing padding and that worked for CoreCLR and NativeAOT for non-generic cases. However, it doesn't work on Mono and it doesn't work in generic scenarios where the size of the containing structure can't be known (as one of the fields is generic).

We also can't represent the "spare bits" concept in a platform-agnostic way, as even macOS x64 and arm64 differ in which bits they consider "spare".

I'd love to make these cases a source-generator-supported concept, but sadly they fall below the line of things the runtime is better at (architecture-specific differences) and things the runtime needs to handle (type layout for blittability and calling convention

Having language-specific custom layouts as a runtime concept will lead to:

Users of new languages using a "close enough" existing language layout that is an imperfect match. Which means users interested in interop with a new unsupported language having to understand both their language and the "close enough" language.

Binary compat breaks when the runtime picks up support for the new language and presumably every interop library has to upgrade from using the "close enough" language attribute to using the new language attribute.

We will always have users using the incorrect representation of our layout APIs that are "close enough". The majority of usage of explicit layout is done incorrectly (it's very rare for people outside of dotnet/runtime or the C# discord to represent structs containing unions as such instead of putting the explicit layout on the containing struct). I don't think we can stop users from doing this, and by not adding the features necessary to represent this, we make cases like the "represent a Swift type" case even more convoluted and difficult to understand, (which is why there was pushback on the PadSizeToAlignment idea I presented).

Also I'm not sure that language-specific runtime mechanisms are a great idea because languages evolve. Generally they're not breaking their ABI on every major version, but it's conceivable that some Rust edition or new version of Swift breaks compat with a previous one and now we're stuck with an enum value that is ambiguous or unusable.

If we're concerned, we can name the Swift members Swift5Struct and Swift5Enum to state that they're a representation of the Swift 5 ABI.

After thinking about this more, I'm actually not quite sure I 100% understand the reason behind Swift, // Swift-specific layout rules

In general, ABIs are determined roughly in terms of the underlying system ABI, which is itself largely oriented around C. Accordingly, if a type or method cannot be defined using "standard" C, then it is not possible for C to call and therefore not really possible for arbitrary interop -- Where "standard" C is a little bit looser term than "spec-compliant" C and really just means, definable by GCC/Clang/MSVC

As you know, we already have mechanisms to represent types that don't exist in C for interop scenarios (explicit-layout w/ non-zero offsets).

Based on some of the above, it sounds like we're trying to circumvent the core interop APIs for Swift and trying to interact with it directly, rather than going through the C compatible ABI defined for the language and so its kind-of like if some C API tried to call into a .NET generic function directly. Yes you can do it, but its technically depending on implementation details that are subject to change and is largely undefined behavior. If C wants to call a .NET generic API, then it needs to use the appropriate hooks to resolve an ABI stable wrapper instead.

For Swift interop, we are trying to introduce support to directly call into Swift APIs. That is the explicit goal of the project. In .NET 8 and earlier, Swift interop requires a significant amount of codegen in both C# and Swift to provide a C-compatible API surface on each side. Our initiative is to expand what .NET supports to enable directly calling Swift APIs with the Swift calling convention with Swift types.

An explicit goal of this work is to make it possible to call Swift APIs without having to map a Swift API to a C-compatible API. We're not circumventing the core interop APIs, we're explicitly expanding what we support to include the Swift ABI on Apple platforms.

It'd be great if this general scenario could be clarified and why we need such a feature but C does not.

Swift's layout rules are explicitly not C-compatible. For example, the lack of trailing padding is not expressible in C, only in LLVM IR.

Swift has explicitly stabilized portions of their ABI in Swift 5. We're only proposing including support in .NET for these stable portions of the ABI that are required to accurately call exposed Swift functions in the APIs we're looking at projecting into .NET.

tannergooding commented 2 months ago

Swift has explicitly stabilized portions of their ABI in Swift 5. We're only proposing including support in .NET for these stable portions of the ABI that are required to accurately call exposed Swift functions in the APIs we're looking at projecting into .NET.

👍, if it's explicitly defined as stable then I think that alleviates most of the concerns I had.

jkoritzinsky commented 2 months ago

@lambdageek and I spoke offline and he's okay with the updated proposal given some of the mentioned concerns. I'll update the top and mark this as ready for review (and blocking).

jkotas commented 2 months ago

I think we should have good understanding of the Swift generics and Swift UI solution end-to-end before we start implementing the Swift-specific field layout in the runtime. I see it as nice-to-have at this point. I am not 100% convinced that it will be required at the end.

jkoritzinsky commented 2 months ago

We'll need to either support Optional<T> or have specialized instantiations of it for our goals (CryptoKit's APIs that we use have Swift Optional<T> parameters), so I want to make sure that we're clear to implement it.

I agree that we should have a good understanding of the layout algorithms and which portions we need for our .NET 9 goals before implementing it.

jkotas commented 2 months ago

have specialized instantiations of it for our goals (CryptoKit's APIs that we use have Swift Optional parameters

I think this would be sufficient for our .NET 9 Swift interop goals.

jkotas commented 2 months ago

(I am fine with doing prep-work towards this proposal, like running it through API review and removing the superfluous validation of LayoutKind in Roslyn.)

bartonjs commented 1 month ago

Video

Custom implies customization, so let's rename it to Extended
Sequential and Union => CStruct, CUnion
We discussed separate attributes for things like RequiredDiscriminatorBits and decided that one big grab bag is the better approach (for now)

namespace System.Runtime.InteropServices
{
    public enum LayoutKind
    {
        Extended = 1,
    }

    [AttributeUsage(AttributeTargets.Struct)]
    public sealed class ExtendedLayoutAttribute : Attribute
    {
        public ExtendedLayoutAttribute(ExtendedLayoutKind kind) {}

        // Only valid for ExtendedLayoutKind.SwiftEnum
        public int RequiredDiscriminatorBits { get; set; }
    }

    public enum ExtendedLayoutKind
    {
        CStruct, // C-style struct
        CUnion, // C-style union
        SwiftStruct, // Swift struct
        SwiftEnum, // Swift enumeration
    }
}

namespace System.Runtime.InteropServices.Swift
{
    // Represents a pointer to a Swift object (for the purposes of calculating spare bits)
    public readonly unsafe struct SwiftObject
    {
        // Would be implemented to mask out the spare bits
        public void* Value { get; }
    }

    // Represents a bool value in Swift (for the purposes of calculating spare bits)
    public readonly unsafe struct SwiftBool
    {
        // Would be implemented to mask out the spare bits
        // or assign while preserving spare bits
        public bool Value { get; }
    }
}