dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

Struct promoted to auto-layout if it contains a non-blittable field type + explicit layout field type #44579

Closed xoofx closed 2 years ago

xoofx commented 3 years ago

Description

We have discovered an unexpected behavior of the runtime regarding layout of a struct in memory with .NET CLR. In the following example, the struct BoolAndExplicitStruct layout will be promoted to an auto-layout because of the presence of the field public bool Bool and public ExplicitLayoutStruct ExplicitLayout together:

   [StructLayout(LayoutKind.Explicit)]
    struct ExplicitLayoutStruct {
        [FieldOffset(0)] public byte B00;
    }

    struct RegularStruct
    {
        public int Field;
    }

    struct BoolAndExplicitStruct
    {
        // bool AND a field with an explicit layout struct
        // change the layout of this struct to Auto instead of sequential
        public bool Bool;
        public ExplicitLayoutStruct ExplicitLayout;
        public int Int32;
        public RegularStruct RegularStruct;
        public long Int64;
    }

    // Field Int64: starts at offset 0
    // Field Int32: starts at offset 8
    // Field Bool: starts at offset 12
    // Field ExplicitLayout: starts at offset 16
    // Field RegularStruct: starts at offset 24

This is causing lots of trouble on the Burst compiler on our side, because as we are using these structs in native code, we expect the same sequential+explicit layout than .NET, but we were not expecting an implicit auto-layout promotion in this case (We haven't implemented today the auto-layout because we fear its implementation dependent behavior)

It seems to be related around this function CheckIfDisqualifiedFromManagedSequential or the caller of this function, but haven't identified exactly in the code when this flip occurs.

Do you know the reason of this auto-layout promotion? Could we revert that behavior? (We will be happy to make a PR)

Configuration

Happens on all OS, all CPU and all .NET version (including .NET framework) but not on Mono (because Mono doesn't have auto-layout promotion afaik)

Regression?

Nope.

Dotnet-GitSync-Bot commented 3 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

MichalStrehovsky commented 3 years ago

I'll just paste the relevant part of the doc for StructLayoutAttribute like I did in discord: "For non-blittable types, it controls the layout when the class or structure is marshaled to unmanaged code, but does not control the layout in managed memory."

The native layout of the struct is as one would expect based on the fact that C# structs are implicitly marked Sequential: Marshal.OffsetOf<BoolAndExplicitStruct>("Bool") == 0

Sequential layout specification doesn't control the managed view because BoolAndExplicitStruct is not blittable (there's the bool field) - this behavior falls out from the doc.

There was a similar conversation about StructLayout specification not applying to the managed view e.g. here: https://github.com/dotnet/runtime/issues/1770#issuecomment-574979786

xoofx commented 3 years ago

Sequential layout specification doesn't control the managed view because BoolAndExplicitStruct is not blittable (there's the bool field) - this behavior falls out from the doc.

That's a bit incorrect. The presence of a bool doesn't change the layout. You can check by yourself. Here, that's the presence of a non-blittable type AND an explicit struct layout field. That's why it is so unexpected.

For example the following BoolAndExplicitStruct struct is perfectly sequential:

    [StructLayout(LayoutKind.Explicit)]
    struct ExplicitLayoutStruct {
        [FieldOffset(0)] public byte B00;
    }

    struct RegularStruct
    {
        public int Field;
        public bool Bool;
    }

    struct BoolAndExplicitStruct
    {
        // bool AND a field with an explicit layout struct
        // change the layout of this struct to Auto instead of sequential
        public bool Bool;
        //public char Char; // same behavior than bool
        //public ExplicitLayoutStruct ExplicitLayout;
        public int Int32;
        public RegularStruct RegularStruct;
        public long Int64;
    }

but if you uncomment public ExplicitLayoutStruct ExplicitLayout you will get an auto-layout.

I was not aware of any auto-layout promotion except when you have a managed object inside one of the struct. But this auto-layout seems completely arbitrary and, well, quite wrong to me.

xoofx commented 3 years ago

Until now, I was always expecting that auto-layout promotion would only happen in the presence of some managed objects in any embedded structs. But this auto-layout promotion in the presence of only unmanaged type is breaking these expectations quite hard, it makes basically native struct sharing barely impossible without using explicit layout everywhere (and explicit layout is awful to work with when you start to have a variable pointer type 32/64 bits).

MichalStrehovsky commented 3 years ago

The presence of a bool doesn't change the layout.

The presence of the bool makes the struct non-blittable - at that point the runtime is free not to respect the StructLayoutAttribute. It will often end up sequential in the managed view too, but details of when that happens are undocumented. I tried to look up the reasoning behind why the runtime ends up laying it out differently in this case, but that change was made before 2010 and is more difficult to dig out.

xoofx commented 3 years ago

The presence of the bool makes the struct non-blittable - at that point the runtime is free not to respect the StructLayoutAttribute. It will often end up sequential in the managed view too, but details of when that happens are undocumented. I tried to look up the reasoning behind why the runtime ends up laying it out differently in this case, but that change was made before 2010 and is more difficult to dig out.

Indeed, and I would love to understand the reason for this specific case, and if we can revert that behavior for unmanaged types (blittable or not)

Ultrahead commented 3 years ago

I always wondered why it was decided to map a boolean against an int in memory instead of a byte?

jkoritzinsky commented 3 years ago

Once the struct is considered non-blittable, the runtime checks if the struct is disqualified from being managed sequential. Since one of the fields is not managed sequential (it's explicit), the type is disqualified from having a sequential managed layout.

xoofx commented 3 years ago

Once the struct is considered non-blittable, the runtime checks if the struct is disqualified from being managed sequential. Since one of the fields is not managed sequential (it's explicit), the type is disqualified from having a sequential managed layout.

Why having an explicit layout of a field is disqualifying the struct for being sequential? Why promoting to auto-layout? Also, if you remove the bool but keep the explicit layout field in my example above, it won't promote the struct to auto-layout. So what you describe is not exactly what is happening.

I always wondered why it was decided to map a boolean against an int in memory instead of a byte?

In managed, a bool is mapped to a byte. The problem mentioned above doesn't change this layout for the bool type (it's still 1 byte). The problem is implicit promotion to auto-layout of the struct, but an auto-layout won't change the size of the blittable types, unlike P/Invoke interop that will involve marshalling behaviors.

Why in the context of interop marshalling a bool is translated to int? I believe it goes back to the Windows API having declared the C++ BOOL to actually map to a DWORD so an int32, so .NET selected this behavior back in the days to match the Windows layout for BOOL.

jkoritzinsky commented 3 years ago

If you remove the bool field then the structure is blittable, so layout is preserved. Explicit only disqualifies the type if the type is non-blittable, as I said.

The reasoning behind this design is from long before my time and from before the source code was imported into the CoreCLR repo, so I'm not sure what the original reasoning was.

xoofx commented 3 years ago

If you remove the bool field then the structure is blittable, so layout is preserved. Explicit only disqualifies the type if the type is non-blittable, as I said.

Oh, ok, I see, sorry for my misunderstanding.

The reasoning behind this design is from long before my time and from before the source code was imported into the CoreCLR repo, so I'm not sure what the original reasoning was.

Ok. Maybe @jkotas knows about this? I'm still struggling to see a practical reasons for this choice. And it's really hurting interop to native scenario where you don't want to involve marshalling rules. It makes almost impossible to use sequential struct with native code, unless you are sure that you don't hit the implicit auto-layout promotion rules.

Some of our users are starting to request us to implement the rules of auto-layout of CoreCLR in our compiler, which is pretty bad, because it will put us in a risky positions of being broken very easily by the CLR. I know all of these are implementation details but really, that's not great to have such behavior. I have been doing low level interop for more than 10 years, and I'm quite choked to discover such a rule so late, it's scary.

MichalStrehovsky commented 3 years ago

And it's really hurting interop to native scenario where you don't want to involve marshalling rules.

If that's the scenario, you just need to make sure the structure is blittable - the bool field makes the structure not blittable, so interop is going to have to do marshalling (I think this applies even if the managed layout happens to match the native layout by accident). The difference in layout is only happening because the structure is not blittable.

xoofx commented 3 years ago

If that's the scenario, you just need to make sure the structure is blittable - the bool field makes the structure not blittable, so interop is going to have to do marshalling (I think this applies even if the managed layout happens to match the native layout by accident). The difference in layout is only happening because the structure is not blittable.

Yeah, but it is basically asking that we won't allow our users to use bool or char or we will have to post process the assemblies and do horrible things with IL to make it pass. That's quite a big deal. Specially because also Mono doesn't do this kind of re-auto-layout, so we discovered only this case because some of our users are running Burst on .NET Core. This rule of non-blittable promoting to auto-layout doesn't make sense for me, it seems completely arbitrary without any technical reasons.

jkotas commented 3 years ago

Maybe @jkotas knows about this?

I do not know why it is done this way.

Ultrahead commented 3 years ago

The casting would be a perf killer, but this one is blittable ...

[System.Serializable]
public struct bBool // : IComparable, IComparable<bBool>, IConvertible, IEquatable<bBool>, IFormattable ... etc.
{
    private byte m_value; // Add all attributes for Unity that you may need

    public bBool(bool value) => m_value = (byte)(value ? 1 : 0);

    public bool IsTrue => m_value == 1;

    public bool IsFalse => m_value != 1;

    public void Switch() => m_value = (byte)(m_value == 1 ? 0 : 1);

    public void ToTrue() => m_value = (Byte)1;

    public void ToFalse() => m_value = (Byte)0;

    public override string ToString() => m_value == 1 ? "True" : "False";

    public static implicit operator bBool(bool value) => new bBool(value);

    public static implicit operator bool(bBool value) => value.m_value == 1;
}
PathogenDavid commented 3 years ago

I have been doing low level interop for more than 10 years, and I'm quite choked to discover such a rule so late, it's scary.

👆🏼 This


The runtime may be permitted to to reorder fields on this struct, but I don't think anyone outside of people on the CoreCLR team would think that blatantly ignoring sequential layout is reasonable.

Would you expect this struct to have a completely different layout compared to @xoofx's BoolAndExplicitStruct?

public struct BoolAndByteInsteadOfStruct
{
    public bool Bool;
    public byte B;
    public int Int32;
    public RegularStruct RegularStruct;
    public long Int64;
}

When I look at both structs, I see the following:

Even if I knew that the bool makes this non-blittable and therefore layout rules go out the window, I'd at least expect the runtime to choose the same layout for each of them but it doesn't:

==== Managed Layout of BoolAndExplicitStruct ====
System.Int64 Int64 @ 0
System.Int32 Int32 @ 8
System.Boolean Bool @ 12
ExplicitLayoutStruct ExplicitLayout @ 16
RegularStruct RegularStruct @ 24

==== Managed Layout of BoolAndByteInsteadOfStruct ====
System.Boolean Bool @ 0
System.Byte B @ 1
System.Int32 Int32 @ 4
RegularStruct RegularStruct @ 8
System.Int64 Int64 @ 16

If it consistently performed automatic layout (and the layouts are the same if you explicitly give the runtime permission to do automatic layout with LayoutKind.Auto) it'd be one thing. However the fact that it generally does the expected thing makes this gotcha even more unpleasant.


the bool field makes the structure not blittable

In my eyes the fact that the runtime considers bool and char to be non-blittable is an unfortunate artifact of a decision made in 2002 to make interop with Windows easier (at the expense of everything else.) In modern times this feels more like a gotcha than something beneficial.

Is it at all feasible to have either a module attribute or a feature flag that says "No really, unless I say otherwise using MarshalAs, I accept bool is 8 bits and char is a 16 bit UTF16 character"? Or barring that, an additional struct layout which is essentially SequentialManaged?

(Heck, I'd love a ISeriouslyNeverWantMarshalingLogicToRunForPInvokesOnThisAssemblyAndYouShouldThrowAnExceptionIfItWereNeededAttribute, but that's getting off topic.)

PathogenDavid commented 3 years ago

Is it at all feasible to have either a module attribute or a feature flag that says "No really, unless I say otherwise using MarshalAs, I accept bool is 8 bits and char is a 16 bit UTF16 character"?

So I've been doing some digging, and I found out that this kinda already exists for char.

IsFieldBlittable in fieldmarshaler is what determines whether or not a field is considered blittable or not. In this function one of the check is affectively "If a char field is marked as unicode or marshaled as an I2 or U2, it is blittable":

https://github.com/dotnet/runtime/blob/fcce1608222d542eea0b419bcee3c62775c1c418/src/coreclr/src/vm/fieldmarshaler.cpp#L233-L235

This means if you mark your module with [module: DefaultCharSet(CharSet.Unicode)], char will be blittable by default.


Unfortunately there is no such escape hatch for ELEMENT_TYPE_BOOLEAN (even if you manually specify a 1 byte MarshalAs.) I would like propose two changes:

1) Add an escape hatch for booleans when they are marshaled in a blittable fashion:

// The lack of NATIVE_TYPE_DEFAULT is intentional here.
// This value really just means "no marshaling specified",
// which does not represent the native type in the case of booleans.
case ELEMENT_TYPE_BOOLEAN:
    isBlittable = (nativeType == NATIVE_TYPE_I1) || (nativeType == NATIVE_TYPE_U1);
    break;

2) Introduce a new module-level attribute similar to DefaultCharSetAttribute but for booleans:

namespace System.Runtime.InteropServices
{
    [AttributeUsage(AttributeTargets.Module, Inherited = false)]
    public sealed class DefaultBooleanMarshalingAttribute : Attribute
    {
        // Note that I'm not at all attached to the idea of specifying an UnmanagedType here.
        // I'm only expressing it this way to keep with the spirit of DefaultCharSetAttribute
        public DefaultBooleanMarshalingAttribute(UnmanagedType unmanagedType)
            => UnmanagedType = unmanagedType;

        public UnmanagedType UnmanagedType { get; }
    }
}

As with DefaultCharSet, Roslyn would have special understanding of this attribute and use it to set the marshaling mode of boolean fields when it is present.

Edit: Hmm, one possible issue with this (and probably why there's no escape hatch today) is that bool marshaling protects against awkward bools. In that case perhaps a new NATIVE_TYPE_BLITTABLEBOOL could be added to CorNativeType.

xoofx commented 3 years ago

I'm not sure that we need an attribute to fix this problem, because today, the promotion to auto-layout doesn't make the struct blittable either, it changes its layout but it doesn't promote a bool to an int, so it's not a problem of marshalling. It seems more an arbitrary encoded rule that is not backup by a technical reasons.

If you look at the code, blittable and sequentiability are two different aspects, but here the trouble comes for sequentiability:

https://github.com/dotnet/runtime/blob/29e9b5b7fd95231d9cd9d3ae351404e63cbb6d5a/src/coreclr/src/vm/classlayoutinfo.cpp#L492-L497

So I would be more in favor to just remove this auto-layout promotion unless there is a managed object.

Ultrahead commented 3 years ago

In my eyes the fact that the runtime considers bool and char to be non-blittable is an unfortunate artifact of a decision made in 2002 to make interop with Windows easier (at the expense of everything else.) In modern times this feels more like a gotcha than something beneficial.

I agree a 100% with that comment

Ultrahead commented 3 years ago

In that case perhaps a new NATIVE_TYPE_BLITTABLEBOOL could be added to CorNativeType.

Agree. That's why I proposed a workaround bBool struct.

Ultrahead commented 3 years ago

So I would be more in favor to just remove this auto-layout promotion unless there is a managed object.

That and to make a boolean blittable by mapping it to a byte on the native side.

xoofx commented 3 years ago

That and to make a boolean blittable by mapping it to a byte on the native side.

I would prefer to not conflate the auto-layout promotion described here and the default non-blittabilty of bool. This can be opened in a separate issue, but here, I don't propose to change this behavior.

PathogenDavid commented 3 years ago

It seems more an arbitrary encoded rule that is not backup by a technical reasons.

I do agree, I was basing my investigation/proposal on the other comments stating the fact that bool is non-blittable is what indicates to the runtime that it has permission to mess with the layout.

I would prefer to not conflate the auto-layout promotion described here and the default non-blittabilty of bool. This can be opened in a separate issue, but here, I don't propose to change this behavior.

The logic in the runtime is that the type will use sequential layout if it is blittable or managed sequential. If both are false (as is with BoolAndExplicitStruct) it will automatically determine the layout using PlaceInstanceFields:

https://github.com/dotnet/runtime/blob/fcce1608222d542eea0b419bcee3c62775c1c418/src/coreclr/src/vm/methodtablebuilder.cpp#L1716-L1753

They're two sides of the same coin, so addressing either would fix your core issue.

That being said, I do agree it'd be better to consider each issue separately. I'll open up a separate issue tomorrow morning.

xoofx commented 3 years ago

The logic in the runtime is that the type will use sequential layout if it is blittable or managed sequential. If both are false (as is with BoolAndExplicitStruct) it will automatically determine the layout using PlaceInstanceFields:

Ah, good catch, I was not able to find where this logic was encoded from the code I linked before. I understand now how things are connected in the code now, thank you.

MichalStrehovsky commented 3 years ago

I think this thread is conflating two things: managed layout and native layout of a type. A type can have two layouts: one that is used within the managed environment, when managed code is operating on the data structures, and one that is used when the type is exposed to unmanaged code through e.g. p/invoke interop, Marshal.StructureToPtr/PtrToStructure, etc.

StructLayoutAttribute controls the native layout. It's documented to do only that. Period.

Sentences like "I have been doing low level interop for more than 10 years, and I'm quite choked to discover such a rule so late, it's scary" make it seem like this thread is about the native layout of the type and the runtime not respecting the attribute when computing the native layout (i.e. when the type is passed as a parameter to a p/invoke that might have an opinion about how the type is laid out in memory). But this discussion and the issue is actually about managed layout. Managed layout is not relevant to interop. Native code should never see the managed layout of a type. Native code only sees the native layout.

There's one special case and that's blittable types, where the runtime is required to say managed layout == native layout. In that case StructLayoutAttribute will obviously be respected, because it needs to be respected for native layout.

Here's some code:

using System;
using System.Runtime.InteropServices;
using System.Runtime.CompilerServices;

Console.WriteLine("Offset of Bool field in the native layout: " + Marshal.OffsetOf<BoolAndExplicitStruct>("Bool"));

BoolAndExplicitStruct b = default;
Console.WriteLine("Offset of Bool field in the managed layout: " +
    Unsafe.ByteOffset<bool>(ref Unsafe.As<BoolAndExplicitStruct, bool>(ref b), ref b.Bool));

[StructLayout(LayoutKind.Explicit)]
struct ExplicitLayoutStruct
{
    [FieldOffset(0)]
    public byte B00;
}

struct BoolAndExplicitStruct
{
    public bool Bool;
    public ExplicitLayoutStruct ExplicitLayout;
    public int Int32;
    public long Int64;
}

This will print:

Offset of Bool field in the native layout: 0
Offset of Bool field in the managed layout: 12

Now change the bool Bool field to be a byte. This will make the struct blittable. Now the program will print:

Offset of Bool field in the native layout: 0
Offset of Bool field in the managed layout: 0

There's no bug in the interop scenario - the type will be laid out sequentially for p/invokes as expected.

I think the runtime should be free to organize the managed layout of non-blittable types however it sees fit.

xoofx commented 3 years ago

I think the runtime should be free to organize the managed layout of non-blittable types however it sees fit.

You describe a current behavior but nobody seems to be able to describe the rationale behind "it sees fit". I would prefer a simple rule that says: For all unmanaged types (blittable or not), we should stick to the layout requested.

MichalStrehovsky commented 3 years ago

You describe a current behavior but nobody seems to be able to describe the rationale behind "it sees fit".

Hypothetically - it should be possible for the runtime to collect access patterns and drive the field layout using PGO data to optimize CPU cache utilization.

xoofx commented 3 years ago

Hypothetically - it should be possible for the runtime to collect access patterns and drive the field layout using PGO data to optimize CPU cache utilization.

Well, yeah, that's for example what I opted in stark-lang where structs are by default auto-layout, not sequential 😉

But in .NET, structs are by default sequential, so we are not in the position to fulfill this hypothetical dream. The promotion rule to auto-layout are not described anywhere, so I would prefer to stick to a more predictive rule For all unmanaged types (blittable or not), we should stick to the layout requested.

MichalStrehovsky commented 3 years ago

The attribute does not control the managed layout.

image

jkoritzinsky commented 3 years ago

For reference, the ManagedSequential feature was added in .NET Framework 2.0 and the algorithm has remained unchanged since then. The rationale I would posit by the design is "if the user makes their struct sequential and every nested struct sequential and we don't need to align any reference type fields, we might as well use the sequential layout since we're already calculating it" (before .NET 5, interop layout was calculated at type load time so we calculated the managed sequential layout at the same time as the native sequential layout). Once you add an explicit layout struct field, the runtime figures "I need to do extra validation for explicit layout anyway, so don't do ManagedSequential".

Since I can't find any docs from that era I don't know the rationale for sure, but @MichalStrehovsky is correct that we don't guarantee this behavior. It's just a convenience. If you want your managed layout to match your native layout, then you need to use blittable types.

jkoritzinsky commented 3 years ago

Based on when it was released, my guess would be that the feature was added so that in C++/CLI, regular managed structs that didn’t have reference fields would have the same layout as if they were regular C++ structs in unmanaged code. Since C++ has no concept of explicit layout, the user would have to explicitly apply the StructLayout attribute, which means that the type must be a managed type.

MichalStrehovsky commented 3 years ago

Based on when it was released, my guess would be that the feature was added so that in C++/CLI, regular managed structs that didn’t have reference fields would have the same layout as if they were regular C++ structs in unmanaged code

Makes me wonder whether we would be able to get rid of ManagedSequential if C++/CLI provably can't get to the struct (e.g. a private struct that is not in a C++/CLI assembly). Automatic layout typically ends up with better packing. I also found some hints that ManagedSequential was new in 2.0 - if it wasn't too breaking to go from "auto layout for all non-blittable struct" to "sequential layout for many non-blittable structs" back then between .NET 1.0 to .NET 2.0, maybe it wouldn't be breaking to go back the other way again.

xoofx commented 3 years ago

Since I can't find any docs from that era I don't know the rationale for sure, but @MichalStrehovsky is correct that we don't guarantee this behavior. It's just a convenience. If you want your managed layout to match your native layout, then you need to use blittable types.

Well, this is not guaranteed today but we could guarantee it. We could change the doc above and the implementation:

For unmanaged types, LayoutKind.Sequential controls both the layout in managed memory and unmanaged memory.

At least also it would make the Mono runtime and .NET runtime behaving similarly on that and would make things a lot more predictable.

Joe4evr commented 3 years ago

I always wondered why it was decided to map a boolean against an int in memory instead of a byte?

@Ultrahead Since you asked: https://stackoverflow.com/a/28515361

MichalStrehovsky commented 3 years ago

Well, this is not guaranteed today but we could guarantee it. We could change the doc above and the implementation:

Why would we want to cement a behavior with actual perf disadvantages in docs? I assume you made the choice in Stark to have structs be auto by default for perf reasons as well. The managed view of the struct is under runtime's discretion for the same reason. It happens to sometime be sequential in the managed view too, but it's not documented and it might be worth investigating getting rid of that.

I still don't understand why managed layout matters in your scenario. I see references to Burst being used with CoreCLR - but that sounds like interop like any other - Burst code would be considered native code from the runtime perspective and should see the native view of the struct, not the managed view. In the native view, the bool field is always at offset 0 as expected because StructLayout is respected. The managed view is under control of the managed runtime and is not suitable to be exposed to unmanaged code.

xoofx commented 3 years ago

Why would we want to cement a behavior with actual perf disadvantages in docs? I assume you made the choice in Stark to have structs be auto by default for perf reasons as well. The managed view of the struct is under runtime's discretion for the same reason. It happens to sometime be sequential in the managed view too, but it's not documented and it might be worth investigating getting rid of that.

Yes, but I made the choice to make struct with auto-layout by default, which is very different from the benefits that you are claiming here. Here, you are saying that if we have a default sequential layout but are using char/bool, we might start to layout differently. While the only case where it makes sense today for non-blittable is for managed objects, but for char/bool, really? What do you expect to gain while you won't be able to do this auto-layout promotion/optimizations if there are no char/bool? It seems that you are trying to legitimate the existing behavior for something that has no practical benefits (if struct layout was defaulted to auto, that would be a different discussion)

I still don't understand why managed layout matters in your scenario. I see references to Burst being used with CoreCLR - but that sounds like interop like any other - Burst code would be considered native code from the runtime perspective and should see the native view of the struct, not the managed view. In the native view, the bool field is always at offset 0 as expected because StructLayout is respected. The managed view is under control of the managed runtime and is not suitable to be exposed to unmanaged code.

If you pass a pointer to a struct like this to native code (not a by ref, but an unsafe pointer), the interop layout will do nothing. it will pass directly the pointer of the managed layout, the compiler will not complain, the JIT will be happy also. For us, the transition to Burst should be transparent for our users, ultimately, we would like Burst to be able to compile entirely a code base, maybe even managed. In order to detect the issue above, we will have to basically disallow the usage of bool/char entirely in any structs used in Burst code (or we find a way to recompile all our user assemblies by transforming the bool to byte, not even sure it's practical...)

My higher management might start to ask us if we move one day to CoreCLR to fork CoreCLR just because of that issue. I don't want to fork CoreCLR for that. I know that it's not your problem, but at least you get the context about why I'm so keen on trying to revert that blittable behavior. 😉

Ultrahead commented 3 years ago

In order to detect the issue above, we will have to basically disallow the usage of bool/char entirely in any structs used in Burst code (or we find a way to recompile all our user assemblies by transforming the bool to byte, not even sure it's practical...

Or force devs to find workarounds like bBool.

All in all it's either quite restrictive or cumbersome

Ultrahead commented 3 years ago

That being said, I do agree it'd be better to consider each issue separately. I'll open up a separate issue tomorrow morning.

@PathogenDavid: great. Please let us know the link to it when you do :)

xoofx commented 3 years ago

Or force devs to find workarounds like bBool.

Yeah, we already tried that years ago. Our users hated it. My CTO hates it.

Because also Unity has been always working with Mono, passing a pointer to a struct to native code was giving the same layout. These kind of rules fuels the idea of forking, and I hate that as well.

Ultrahead commented 3 years ago

Yeah, we already tried that years ago. Our users hated it. My CTO hates it

I would hate it also ...

MichalStrehovsky commented 3 years ago

it will pass directly the pointer of the managed layout, the compiler will not complain, the JIT will be happy also.

Right, the runtime/compiler/JIT will be happy about all sorts of incorrect p/invoke declaration. It doesn't make them correct. I guess the runtime could have blocked that, but the users would just go "a-ha! I'll just declare it as void*" so at that point it doesn't matter. If you pass it as ref instead of pointer, I expect the runtime to fix the layout issue for you.

What do you expect to gain while you won't be able to do this auto-layout promotion/optimizations if there are no char/bool?

Bool is specifically a type that auto layout might be able to pack more efficiently than sequential (unless the user makes conscious packing choices) and is pretty common (much more common to see a bool field than say byte/sbyte/short/ushort that are also prone to have packing problems).

Joe4evr commented 3 years ago

Right, the runtime/compiler/JIT will be happy about all sorts of incorrect p/invoke declaration. It doesn't make them correct. I guess the runtime could have blocked that, but the users would just go "a-ha! I'll just declare it as void*" so at that point it doesn't matter.

Is it possible to warn on call-sites, then? DoNotPassNonBlittableStructPointersToExternMethodsAnalyzer 🙃

xoofx commented 3 years ago

Right, the runtime/compiler/JIT will be happy about all sorts of incorrect p/invoke declaration. It doesn't make them correct. I guess the runtime could have blocked that, but the users would just go "a-ha! I'll just declare it as void*" so at that point it doesn't matter. If you pass it as ref instead of pointer, I expect the runtime to fix the layout issue for you.

We have many unmanaged struct containers in DOTS Unity that are starting to rely on where T : unmanaged and have plenty of T* inside them. it's not as simple as just fixing a ref on the P/invoke layer.

Ultrahead commented 3 years ago

Do many of those structs contain char?

jkotas commented 3 years ago

I think it is a good idea to work towards eliminating corner case differences between CLR blittable and C# unmanaged types. The difference between the two concepts is hard to explain, so the less explaining of corner cases we have to do the better. For example, https://github.com/dotnet/runtime/pull/1866 is going in this direction.

Changing layout rules is a breaking change, even if it is in the unspecified territory. It is potentially a breaking change for the R2R file format too.

I do not think we will be able to ever remove the managed sequential concept. It would be too breaking. It means that we have given up the opportunity to auto layout 99+% of the C# unmanaged types unless they are explicitly marked auto layout.

I think getting rid of the explicit struct inside sequential struct promoting to auto-layout may be plausible. We need some data to understand the breaking potential.

xoofx commented 3 years ago

I think getting rid of the explicit struct inside sequential struct promoting to auto-layout may be plausible. We need some data to understand the breaking potential.

If we can do that without requiring any user code changes, that would be fantastic. But I agree, if it is breaking any actual code relying on this undocumented layout then we will have to find another way to support that (e.g maybe a [MarshalAs(U1)] on the bool to tell the layout that it should not promote to auto in that case because we want the same layout for native than managed, even though that would still be super painful for our users)

Ultrahead commented 3 years ago

@xoofx Do many of those structs you mentioned contain char types?

zacknewman commented 3 years ago

The attribute does not control the managed layout.

image

The documentation also states, "The following complex types are also blittable types:

• One-dimensional arrays of blittable primitive types, such as an array of integers. However, a type that contains a variable array of blittable types is not itself blittable. • Formatted value types that contain only blittable types (and classes if they are marshaled as formatted types). For more information about formatted value types, see Default marshaling for value types." (emphasis added)

This means that decimal is also blittable since it is has a LayoutKind.Sequential layout and its 3 fields, _flags, _hi32, and _lo64 are all blittable. However if you replace the bool field in BoolAndExplicitStruct with a decimal, the layout is still not sequential. Clearly something is wrong with at least the documentation.

tannergooding commented 3 years ago

@zacknewman, the types in the "core library" (System.Private.Corelib, mscorlib, etc) are often special and cannot necessarily be understood by simply looking at the C# source code.

decimal is meant to map to the Win32 DECIMAL type (which is what the type is based on, rather than something like IEEE 754 decimal types which are xplat and standardized) and that has special alignment requirements and a different layout in native: https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/fieldmarshaler.cpp#L133-L137

Likewise:

zacknewman commented 3 years ago

@tannergooding, thank you for that info. I figured it had something to do with decimal being special. I find it unfortunate that while the documentation expressly mentions that "primitives" like System.Bool and System.Char are not blittable, it fails to mention System.Decimal too. This led me to being overly cautious when constructing my own "primitives" (e.g., U128) where I was worried relying on a LayoutKind.Sequential layout would not actually enforce its layout in managed memory (since it didn't for decimal) thus motivating me to use the attribute [StructLayout(LayoutKind.Explicit, CharSet = CharSet.Unicode, Pack = 8, Size = 16)] which in turn now causes LayoutKind.Sequential issues when used alongside non-blittable types just like ExplicitLayoutStruct does in this example.

I know the documentation I cited is not a substitute for a formal specification, so perhaps I expect too much from it. On a related note, do you know if that documentation is applicable for "constructed" types too? Or is there an unwritten assumption that it only applies to "plain" types? For example if I have the type constructor:

using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Sequential)]
public readonly struct Maybe<TSome> where TSome: notnull {
    readonly byte _tag;
    readonly TSome _some;
}

is the memory layout in managed memory for Maybe<uint> 1 byte containing _tag followed by 3 bytes of padding followed by 4 bytes containing _some or is the runtime allowed to store it differently since the documentation doesn't apply to "constructed" types? I've experimented with several type constructors with variable number of type parameters and several constructed types from said constructors, and the memory layout was always what I expected/hoped it to be; but this falls embarrassingly short of a mathematical proof.