dotnet / csharplang

The official repo for the design of the C# programming language
11.53k stars 1.03k forks source link

Support sizeof(x) where x is a fixed buffer #2121

Open Korporal opened 5 years ago

Korporal commented 5 years ago

I'm astonished that sizeof(x) doesn't compile when x is a fixed buffer that's within scope. Since such buffers have compile time constant sizes it should be almost trivial to implement sizeofhere, can this be considered?

The only way I can see is to reflect upon the containing type, get the field with the name of fixed buffer field, get its FixedBufferattribute, get that attributes constructor args and eventually get at the declared length!

Such code (even if done statically once) is very cumbersome and fragile (should MS ever alter that FixedBufferattribute or whatever). I can use a static dictionary keyed on containing type and fixed buffer field name and cache the length there (the first time its requested) but even so a dictionary lookup for a simple integer constant that's fixed and known at compile time? really?

(At least nameofworks as expected! but sizeofand typeofgive rather unhelpful error messages)

Korporal commented 5 years ago

So this is what developers have to write if they want to try and maximize performance of code using fixed buffers (which are often used mainly for performance reasons!)

    public static class FixedBuffer
    {
        private static ConcurrentDictionary<string, int> buffer_lengths = new ConcurrentDictionary<string, int>();

        public static int GetCapacity<T>(string FixedBufferName) where T : struct
        {
            var key = $"{typeof(T).Name}.{FixedBufferName}";

            if (buffer_lengths.TryGetValue(key, out int Capacity))
                return Capacity;

            var fields = typeof(T).GetFields().Where(f => f.Name == FixedBufferName);

            if (!fields.Any())
                throw new ArgumentException($"No field with that name exists within the type: {typeof(T).Name}", nameof(FixedBufferName));

            var args = fields.First().CustomAttributes.First().ConstructorArguments;

            buffer_lengths.TryAdd(key, (int)args[1].Value);

            return buffer_lengths[key];
        }
    }

Can we therefore consider adding sizeofsupport for fields that are declared as fixed [] ?

Since fixed buffers have a very contrived type definition (I personally think fixed buffers should be implemented as a kind of stand alone primitive type) its not possible to pass a type name into sizeof so the declared buffer identifier makes sense, in this case.

juliusfriedman commented 5 years ago

In the same scope... that wouldn't require a lookup as you created the buffer in that scope? You know it's size...

Korporal commented 5 years ago

@juliusfriedman

In the same scope... that wouldn't require a lookup as you created the buffer in that scope? You know it's size...

True but we must repeat the value everywhere we use that size creating fragility should anyone alter the size in one place and not others. Once can create constants and so on to address this but since its size is fixed compile time constant it strikes me as easy to deliver this.

juliusfriedman commented 5 years ago

Really, it seems like the fragility comes in the code which consumed it...

Provide an example IRL where this occurs as I can't even contrive it...

int mySize = fragility;

One would simply refer to mySize.

Korporal commented 5 years ago

@juliusfriedman

Take a look at this code (just posted here):

public interface IFixedBuffer
{
    int FixedBufferLength { get; }
}

public static FixedBufferExtensions
{
    public static ToString<T>(this T buffer) where T : IFixedBuffer
   {
       var length = buffer.FixedBufferLength;
       ...
   }
}

using System;

public unsafe struct AString_32 : IFixedSizeBuffer
{
    public fixed char chars[32];

    public ReadOnlySpan<Char> AsSpan()
    {
        fixed (char*  c = chars)
        {
            return new ReadOnlySpan<Char>(c, 64);
        }
    }

    public int FixedBufferLength => 32;
}

The 32 appears twice (actually three times), the 64 appears once, changing one necessitates changing the others. Then imagine you have many versions of this kind of type and the fragility becomes more apparent.

This could be coded as:

public interface IFixedBuffer
{
    int FixedBufferLength { get; }
}

public static FixedBufferExtensions
{
    public static ToString<T>(this T buffer) where T : IFixedBuffer
   {
       var length = buffer.FixedBufferLength;
       ...
   }
}

using System;

public unsafe struct AString_32 : IFixedSizeBuffer
{
    public fixed char chars[32];

    public ReadOnlySpan<Char> AsSpan()
    {
        fixed (char*  c = chars)
        {
            return new ReadOnlySpan<Char>(c, sizeof(Char) * sizeof(chars));
        }
    }

    public int FixedBufferLength => sizeof(chars);
}

This may seem to be grumbling BUT the buffer size is a known compile time constant, so why not?

YairHalberstadt commented 5 years ago

@Korporal

public interface IFixedBuffer
{
    int FixedBufferLength { get; }
}

public static FixedBufferExtensions
{
    public static ToString<T>(this T buffer) where T : IFixedBuffer
   {
       var length = buffer.FixedBufferLength;
       ...
   }
}

public unsafe struct AString_32 : IFixedSizeBuffer
{
    public fixed char chars[FixedBufferLength];

    public ReadOnlySpan<Char> AsSpan()
    {
        fixed (char*  c = chars)
        {
            return new ReadOnlySpan<Char>(c, FixedBufferLength * 2);
        }
    }

    public int FixedBufferLength => 32;
}
juliusfriedman commented 5 years ago

I don't follow:

1) sizeof(AString_32) is known and can be calculated. 2) Why doesn't AString_32 have a const size so it only have to be changed in one place? e.g. the const which would be used to declare chars...

Korporal commented 5 years ago

@YairHalberstadt - That cannot compile - the fixed buffer bounds [ ] must be a compile time constant.

Korporal commented 5 years ago

I don't follow:

  1. sizeof(AString_32) is known and can be calculated.
  2. Why doesn't AString_32 have a const size so it only have to be changed in one place? e.g. the const which would be used to declare chars...

@juliusfriedman

The code is sample only (it won't compile for other reasons) but post what you mean?

YairHalberstadt commented 5 years ago

@Korporal

public unsafe struct AString_32 : IFixedSizeBuffer
{
    public fixed char chars[LENGTH];

    public ReadOnlySpan<Char> AsSpan()
    {
        fixed (char*  c = chars)
        {
            return new ReadOnlySpan<Char>(c, FixedBufferLength * 2);
        }
    }

    public int FixedBufferLength => LENGTH;
    const int LENGTH = 32;
}
Korporal commented 5 years ago

The point I'm making is not that there are no alternatives but that the length/capacity of a fixed buffer is a fixed compile time constant and in this case it makes sense to be able to treat it as we can other types that also have a fixed compile time size.

In other words why is a fixed buffer the only fixed sized type that cannot be used with sizeof?

Korporal commented 5 years ago

@Korporal

public unsafe struct AString_32 : IFixedSizeBuffer
{
    public fixed char chars[LENGTH];

    public ReadOnlySpan<Char> AsSpan()
    {
        fixed (char*  c = chars)
        {
            return new ReadOnlySpan<Char>(c, FixedBufferLength * 2);
        }
    }

    public int FixedBufferLength => LENGTH;
    const int LENGTH = 32;
}

@YairHalberstadt - Of course that is an option, but we don't have to do this for other fixed size types.

juliusfriedman commented 5 years ago

@Korporal this also works but I would likely use the pattern @YairHalberstadt showed since as you stated the value is known at compile time.

 public interface IFixedBuffer
    {
        int FixedBufferLength { get; }
    }

    public unsafe struct AString_32 : IFixedBuffer
    {
        public fixed char chars[32];

        public ReadOnlySpan<Char> AsSpan()
        {
            fixed (char* c = chars)
            {
                return new ReadOnlySpan<Char>(c, sizeof(Char) * sizeof(AString_32));
            }
        }

        public int FixedBufferLength => sizeof(AString_32);
    }

    static void Main(string[] args)
    {
        Console.WriteLine(System.Runtime.InteropServices.Marshal.SizeOf<AString_32>()); //64

        AString_32 string32;

        Console.WriteLine(string32.FixedBufferLength); //64

        Console.ReadKey();
    }
Korporal commented 5 years ago

@juliusfriedman

@Korporal this also works

 public interface IFixedBuffer
    {
        int FixedBufferLength { get; }
    }

    public unsafe struct AString_32 : IFixedBuffer
    {
        public fixed char chars[32];

        public ReadOnlySpan<Char> AsSpan()
        {
            fixed (char* c = chars)
            {
                return new ReadOnlySpan<Char>(c, sizeof(Char) * sizeof(AString_32));
            }
        }

        public int FixedBufferLength => sizeof(AString_32);
    }

    static void Main(string[] args)
    {
        Console.WriteLine(System.Runtime.InteropServices.Marshal.SizeOf<AString_32>()); //64

        AString_32 string32;

        Console.WriteLine(string32.FixedBufferLength); //64

        Console.ReadKey();
    }

Until one adds other fields to the struct...

juliusfriedman commented 5 years ago

@Korporal, if I added more fields my code would reflect the mathematical paradigm to allow the offset to be calculated e.g. offset-of or otherwise.

Korporal commented 5 years ago

@Korporal, if I added more fields my code would reflect the mathematical paradigm to allow the offset to be calculated e.g. offset-of or otherwise.

@juliusfriedman - I'm sure it would, which is fine but isn't enabling sizeofa neater idea? This is a special case in that we pass the name of the buffer (not a type) but it makes sense for a buffer (who's type under the hood is rather messy) because they are compile time constant size.

juliusfriedman commented 5 years ago

We have Unsafe.SizeOf for that although I will concede that we shouldn't have to jump through hoops to get the same information.

Korporal commented 5 years ago

We have Unsafe.SizeOf for that.

@juliusfriedman - What would you pass for T in the case of a fixed buffer?

Korporal commented 5 years ago

I expect sizeofis a pretty simple thing (under the hood) so I'd expect the compiler guys to be able to do this pretty easily assuming they can be convinced that its not controversial !

juliusfriedman commented 5 years ago

@Korporal, your losing me again... we have Span for that... your losing the info when you decay the pointer.

Korporal commented 5 years ago

Do the compiler/language teams have such a thing as a "low hanging fruit" category? Where criteria for consideration are stuff like:

  1. Is it controversial?
  2. Does it break anything?
  3. Does it limit future enhancements in any way?
  4. Is it relatively easy to implement?

Of course there are other considerations as well (ongoing plans etc) but some changes surely (like this one?) must fall into such a category - is there such a category?

Korporal commented 5 years ago

@Korporal, your losing me again... we have Span for that... your losing the info when you decay the pointer.

@juliusfriedman - Bear mind the impetus for this thread is a related one, which is asking more and seems to be unlikely to get support.

The (my) goal is to make it much easier to write code that delivers inline, fixed capacity, varying length, mutable "strings" in value types without overly relying on various cumbersome mechanisms.

This then lets us pass pointers to structs around (in a host of scenarios, including IPC) and be able to do access string-like members.

We do this now and its fine but we have to generate a large number of types (using T4) like AString_8, AString_16etc etc etc.

The language is certainly rather limiting currently when these kinds of needs are considered.

Korporal commented 5 years ago

@juliusfriedman - Small world, I used to live in Wilmington DE until a few years ago!

YairHalberstadt commented 5 years ago

@Korporal

If this were a purely compiler feature, you could implement it yourself.

However, this requires an addition to the language, so I'm afraid you have to wait for it to be championed and discussed by the LDM. This can be a slow process, and most ideas, even good ones, never end up being championed.

You can expidate the process by writing a specification for the feature, and creating a branch of Roslyn with a prototype of the feature. Even then, you'd be waiting till at least the next release of the language, which may or may not occur in 2019. Jon Skeet is pessimistic, but the LDM is relatively tight lipped on the matter.

Korporal commented 5 years ago

@YairHalberstadt - Thanks, I doubt I'd have the time or energy to open up roslyn and dive in, certainly at the serious level this warrants, but I may take a look.

YairHalberstadt commented 5 years ago

It's still likely not to be implemented, even if you do prototype it, so I would only do so if you are interested in doing it out of fun/learning experience.

Korporal commented 5 years ago

@YairHalberstadt - From what you say here I get the impression that there is little scope for the majority of ideas to get considered which is a shame but a reality, it does however make it somewhat unappealing to even suggest anything either here or the roslyn issues when there is such little prospect of stuff being considered.

Do people (not that I'd consider it) every run purely with their own compiler builds and simply merge updates in from Microsoft as they are released? does any company work like that?

juliusfriedman commented 5 years ago

@Korporal, Tis a small world after all ;)

But.. there is no way sizeof support alone can allow one to pass pointers to structures... right?

There is the unmanaged constraint which is proposed to do that as far as I know but I would suspect in such cases there is little difference from passing a fixed buffer or any other type there when you think about it as the call site will only pass the pointer and the invocation sees only the pointer with no knowledge of size, one would still need to pass the length.

Korporal commented 5 years ago

Hmm GetPinnableReference - seems interesting...

https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/fixed-statement

YairHalberstadt commented 5 years ago

From what you say here I get the impression that there is little scope for the majority of ideas to get considered which is a shame but a reality, it does however make it somewhat unappealing to even suggest anything either here or the roslyn issues when there is such little prospect of stuff being considered.

That is the frustrating reality, but overall I think C# is better for it. It stops C# containing "everything and the kitchen sink".

One thing to think about is that when writing a compiler, or adding a feature to a compiler, you have to consider all the possible ways the various language features can interact. That number is exponential in the number of language features, so even for an apparently 'trivial' language change, it can make improving the compiler hugely more difficult.

I think most people who hang around here, do so because they are interested in language design, and the evolution of C#. Not because they want to see their pet feature implemented. And every suggestion one makes should be taken as just that - a suggestion.

Do people (not that I'd consider it) every run purely with their own compiler builds and simply merge updates in from Microsoft as they are released? does any company work like that

I have no idea. Sounds fun, but a bad idea. Microsoft tends to experiment with flavors of C# in internal research projects, so I would imagine they probably do.

HaloFour commented 5 years ago

I can see there being support for treating a fixed buffer as a span via implicit conversion. That would give you an easier API for working with these buffers (including treating fixed char buffers like strings) as well as easy access to their length.

Korporal commented 5 years ago

@Korporal, Tis a small world after all ;)

But.. there is no way sizeof support alone can allow one to pass pointers to structures... right?

There is the unmanaged constraint which is proposed to do that as far as I know but I would suspect in such cases there is little difference from passing a fixed buffer or any other type there when you think about it as the call site will only pass the pointer and the invocation sees only the pointer with no knowledge of size, one would still need to pass the length.

@juliusfriedman Not sure I understand you Julius. The unmanagedconstraint lets us take a pointer to a structgenerically (prior to it this was not possible) this is a huge step forward (despite a bug I found recently).

juliusfriedman commented 5 years ago

@Korporal, not sure how to further explain it...

@HaloFour, I believe that is a much better approach!

I am also pretty sure there is a proposal for that as well.

CyrusNajmabadi commented 5 years ago

Do the compiler/language teams have such a thing as a "low hanging fruit" category.

Yes. It's called: a LDM member wants to champion this. Meaning, they've assessed all of that and made such a determination.

CyrusNajmabadi commented 5 years ago

I can see there being support for treating a fixed buffer as a span via implicit conversion. That would give you an easier API for working with these buffers (including treating fixed char buffers like strings) as well as easy access to their length.

Agreed. I would definitely support such a proposal if it were created.

tannergooding commented 5 years ago

Agreed. I would definitely support such a proposal if it were created.

Implicit or explicit conversion to Span should probably just be tacked on as part of the support for the already championed/general-purpose fixed-sized-buffer proposal (https://github.com/dotnet/csharplang/blob/master/proposals/fixed-sized-buffers.md).

HaloFour commented 5 years ago

@tannergooding

Implicit or explicit conversion to Span should probably just be tacked on as part of the support for the already championed/general-purpose fixed-sized-buffer proposal

Should a proposal be created at least to track the work to add it to that other proposal? Otherwise we're just all relying on someone remembering to bring it up at some LD meeting? 😁

Korporal commented 5 years ago

Out of curiosity where approx in the Roslyn source is the parsing and validation of sizeof(X) expression? I see too that sizeofis represented as an ILinstruction (wonder why it wasn't just replaced at compile time with a constant).

Just curious about what I would need to change if I were to (experimentally) implement this in a test build of the compiler.

Thx

CyrusNajmabadi commented 5 years ago

Hey @Korporal . Parsing is handled here:

https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Parser/LanguageParser.cs#L9980-L9988

Semantic analysis and binding of that expr happens in:

https://github.com/dotnet/roslyn/blob/3bc2bdea33e1da702a7203afe036d8575e931f89/src/Compilers/CSharp/Portable/Binder/Binder_Expressions.cs#L1139

Actual handling of it in terms of IL seems to happen here:

https://github.com/dotnet/roslyn/blob/3bc2bdea33e1da702a7203afe036d8575e931f89/src/Compilers/CSharp/Portable/CodeGen/EmitExpression.cs#L2964-L2969

I hope this helps!

CyrusNajmabadi commented 5 years ago

(wonder why it wasn't just replaced at compile time with a constant).

Because the compiler doesn't know the size the type is outside of some basic types. There is no relation between the IL you compile against and the IL you run against, so the type may have a totally different size at runtime.

Korporal commented 5 years ago

(wonder why it wasn't just replaced at compile time with a constant).

Because the compiler doesn't know the size the type is outside of some basic types. There is no relation between the IL you compile against and the IL you run against, so the type may have a totally different size at runtime.

@CyrusNajmabadi

OK I understand - to an extent - (platform agnostic). But surely a Byte is universal as is Int16 and so on? I personally don't know of any hardware that has a byte that isn't 8 bits or a 16 bit signed it that isn't 16 bits, am I missing something?

tannergooding commented 5 years ago

But surely a Byte is universal as is Int16 and so on

At least as far as the C# spec is concerned, the various primitives are constant and are handled that way: https://github.com/dotnet/csharplang/blob/master/spec/unsafe-code.md#the-sizeof-operator

However, all other structs are just handled via the sizeof IL instruction; which the JIT will treat as a runtime constant anyways. And which, outside of some specific scenarios which the C# language spec currently doesn't define, can only be a runtime constant due to platform-specific differences (generally layout and padding).

Korporal commented 5 years ago

@tannergooding - Thanks for clarifying!

CyrusNajmabadi commented 5 years ago

OK I understand - to an extent - (platform agnostic). But surely a Byte is universal as is Int16 and so on? I personally don't know of any hardware that has a byte that isn't 8 bits or a 16 bit signed it that isn't 16 bits, am I missing something?

As mentioned: Because the compiler doesn't know the size the type is outside of some basic types

'byte' would be one of those basic types.

CyrusNajmabadi commented 5 years ago

Weird... why did my post show up so much later...

GSPP commented 5 years ago

I just had a real-world case where this would have been useful:

        const int MAX_PATH = 260;

        [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
        unsafe struct WIN32_FIND_DATA
        {
            //...
            fixed char _cFileName[MAX_PATH];

            public string GetFileName()
            {
                fixed (char* ptr = _cFileName)
                {
                    var length = 0;
                    for (int i = 0; i < MAX_PATH; i++)
                    {
                        if (ptr[i] == '\0')
                        {
                            length = i;
                            break;
                        }
                    }

                    return new string(ptr, 0, length);
                }
            }
        }

Here, MAX_PATH appears twice. I'd rather have written:

for (int i = 0; i < sizeof(_cFileName) / sizeof(char); i++)

The intention of this for loop is not to iterate MAX_PATH times. The intention is to iterate over the whole buffer regardless of its size. If I could have used sizeof, the intent would have been more clear.

Also, any changes to the size of the buffer would no longer require changes in two places.