dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.15k stars 4.71k forks source link

Memory and ReadOnlyMemory validation errors not matching #23670

Closed Drawaes closed 4 years ago

Drawaes commented 7 years ago

Updated as per the suggestion of a create method Updated with design from @stephentoub which allows all cases to be covered Updated with design change from @benaadams to allow type inference for T Put in namespace and removed the Create to leave only the Dangerous Create Added question around moving current Dangerous Create Method

Rationale

A major use case of [ReadOnly]Span/Memory is to replace handing around array buffers and their offsets and count.

One of the major benifits of the design as I see it as it moves bounds checks out to where the buffer is created which is excellent. However when upgrading legacy code there seems to be a blocker in that stream defines the triplet to be

public int SomeMethod(byte[] buffer, int offset, int count);

This is normally then teamed up with checks on

  1. "is buffer null" == null argument exception
  2. "is offset negative or past the end of the buffer?" == argument out of range exception with offset as field referenced
  3. "is count > than buffer.length, or < 0 or count +offset > buffer.length" == argument out of range exception

The issue with the way it currently is, that for anything that takes that triplet you have to manually do validation on the inputs before creating a Memory or risk having exceptions with names that don't match.

This causes double validation to take place, once in the legacy code and once in the Memory creation. As Memory is often used on the "hot paths" of code it is a penalty for using memory.

Proposal

Add "unsafe" methods to create memory and readonly memory that avoid the extra checks. Then the checks can be maintained in the code with the same error messages and the double check penalty doesn't have to be paid.

namespace System.Runtime.InteropServices
{
    public static class Span
    {
        public static Span<T> DangerousCreate(T[] array, int start, int length);
    ...
    }
    // ... same for ReadOnlySpan<T>

    public static class Memory
    {
        public static Memory<T> DangerousCreate(T[] array, int start, int length);
    }

    // ... same for ReadOnlyMemory<T>
}

Usage

byte[] buffer;

var span = Span.DangerousCreate(buffer, offset, count);
// vs
var span = Span<byte>.DangerousCreate(buffer, offset, count);

Outstanding Questions

Should the existing method

[MethodImpl(MethodImplOptions.AggressiveInlining)]
[EditorBrowsable(EditorBrowsableState.Never)]
public static Span<T> DangerousCreate(object obj, ref T objectData, int length)

Be moved to the new types and should the IDE hiding be removed?

References

dotnet/corefx#24295 The Code this came up in (It means an extra validation path without it)

Below no longer matters as the legacy code can maintain it's own named validation.



|Class|ArrayName|Start|Length|
|---|---|---|---|
|FileStream|buffer|offset|count|
|NetworkStream|buffer|offset|count|
|BufferedStream|buffer|offset|count|
|MemoryStream|buffer|offset|count|
|StreamReader|buffer|index|count|
|StreamWriter|buffer|index|count|
|Stream|buffer|offset|count|
|SslStream|buffer|offset|count|
benaadams commented 7 years ago

Currently its

 ReadOnlyMemory(T[] array, int start, int length)

Change to

 ReadOnlyMemory(T[] buffer, int offset, int count)

Seems good

cc: @ahsonkhan, @KrzysztofCwalina

ahsonkhan commented 7 years ago

Would the same apply to Span/ReadOnlySpan? https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Span.cs#L71-L78

This should go through API review to make sure we want this change and apply it in other places too (to remain consistent).

benaadams commented 7 years ago

Would the same apply to Span/ReadOnlySpan?

/cc @stephentoub

Drawaes commented 7 years ago

Yeah it looks from a quick tour around that it is the same for the non async methods that take span.

@ahsonkhan If you want I can update the top comment to be in proper API review form?

ahsonkhan commented 7 years ago

If you want I can update the top comment to be in proper API review form?

Yes please. Tyvm. We have other APIs in Span that use start and length too, like Slice:

public Span<T> Slice(int start, int length)
Drawaes commented 7 years ago

Updated

KrzysztofCwalina commented 7 years ago

We cannot rename Span.Length to Span.Count. Span is like an array and arrays have Length property.

I also don't want the ctor parameter to be called "count" but the property it initialized being called "Length"

To solve the problem outlined in this issue, could we add a helper method that would create Span out of T[], offset, count while validating the three parameters and throwing the right exception?

Drawaes commented 7 years ago

That seems like a reasonable idea to me.

stephentoub commented 7 years ago

Is the suggestion to have two different APIs for constructing a {ReadOnly}Span/Memory<T> from a T[]/int/int triplet, just with different parameter names and thus different exceptions that get thrown?

benaadams commented 7 years ago

To solve the problem outlined in this issue, could we add a helper method that would create Span out of T[], offset, count while validating the three parameters and throwing the right exception?

Precedent is the (Value)Tuple Create methods, so you don't always have to specify type

var t = ValueTuple.Create(array);

rather than

var t = new ValueTuple<int[]>(array);

So

public static class Span
{
    public static Span<T> Create<T>(T[] buffer, int offset, int count) 
    {
        // Validate
        return Span<T>.DangerousCreate(buffer, offset, count);
    }
}

For

var span = Span.Create(array, offset, count);

Rather than

var span = new Span<int>(array, offset, count);

Obv Tuple is more annoying as it has many type params rather than just one

Drawaes commented 7 years ago

Removed my comment as it was out of order due to my phone ;) Added @benaadams / @KrzysztofCwalina 's idea as the main API request above.

benaadams commented 7 years ago

Would need to be in a new non-generic static class; else you'd have to specify the type anyway and it would loose some advantages; so would be a bit of a weird addition other than the change in names of params

e.g.

Span<int>.Create(array, offset, count);

rather than

Span.Create(array, offset, count);
Drawaes commented 7 years ago

Would it work to add an unsafe unchecked internal method/constructor in coreclr, then add the same to the typeforwarded types in corefx. Finally then a helper method could be added to CoreFX.

This would only need to be internal as most streams are in corefx very few external custom ones I imagine. That way there are no external API additions or changes?

benaadams commented 7 years ago

DangerousCreate is there for Span anyway, though its checking length (but not other params) https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Span.cs#L108-L113

Drawaes commented 7 years ago

So would just need something like that for memory. Would pay for a double length check but it would be less than a full check on all the params.

stephentoub commented 7 years ago

could we add a helper method that would create Span out of T[], offset, count while validating the three parameters and throwing the right exception?

How about just adding a Dangerous/Unsafe factory for {ReadOnly}Memory<T> that doesn't do any argument validation at all, leaving it up to the call site? Then I can do my own argument validation if needed, and places where I know the arguments are valid I can skip the checks entirely. There are a bunch of places I'm seeing where, as I'm switching over from using array-based to memory-based calls, I'm having to construct {ReadOnly}Memory<T> instances from T[]/offset/count triplets that I know are all valid.

Drawaes commented 7 years ago

Seems like that is going to be the most flexible option.

Drawaes commented 7 years ago

@stephentoub do you have a shape in mind for this factory (don't call it factory please :P ) I will happily update the top comment with the idea.

stephentoub commented 7 years ago

do you have a shape in mind for this factory

public ref struct Span<T>
{
    public static Span<T> DangerousCreate(T[] array, int start, int length);
    ...
}
// ... same for ReadOnlySpan<T>

public struct Memory<T>
{
    public static Memory<T> DangerousCreate(T[] array, int start, int length);
    ...
}
// ... same for ReadOnlyMemory<T>
Drawaes commented 7 years ago

Updated top issue.

benaadams commented 7 years ago

Push it out to a static class and can get type inference and throw in the Create with renamed parameter validation?

public static class Span
{
    public static Span<T> Create(T[] buffer, int offset, int count)
    {
         // validate params
        return DangerousCreate(buffer, offset, count);
    }

    public static Span<T> DangerousCreate(T[] buffer, int offset, int count);
    ...
}
// ... same for ReadOnlySpan<T>

public static class Memory
{
    public static Memory<T> Create(T[] buffer, int offset, int count)
    {
         // validate params
        return DangerousCreate(buffer, offset, count);
    }

    public static Memory<T> DangerousCreate(T[] buffer, int offset, int count);
    ...
}
// ... same for ReadOnlyMemory<T>
byte[] array;

var span = Span.DangerousCreate(array, start, length);
// vs
var span = Span<byte>.DangerousCreate(array, start, length);

// drop in validation Exception matching
var memory = Memory.Create(array, start, length);

Also then the Throws can be in non-generic class

Drawaes commented 7 years ago

I prefer that, but @stephentoub your the one that gets to be in the API reviews, whats your thinking? (Apart from the fact we could never use the var in corefx ;) )

stephentoub commented 7 years ago

Seems reasonable. @KrzysztofCwalina, @ahsonkhan, was there strong reasoning for the ctor-based span/memory creation vs factory-based span/memory creation?

Drawaes commented 7 years ago

@benaadams @stephentoub updated top issue to match Bens design

KrzysztofCwalina commented 7 years ago

I am not sure it's worth adding two types to System root namespace just so we can have the factory methods. Also, I don't think we should have Dangerous APIs on such types. We decided to move all dangerous APIs to Marshal-like type in System.Runtime.InteropServices.

benaadams commented 7 years ago

Move the entire classes into System.Runtime.InteropServices the argument rename is for interop also :)

namespace System.Runtime.InteropServices
{
    public static class Span
    {
        public static Span<T> Create(T[] buffer, int offset, int count);
        public static Span<T> DangerousCreate(T[] buffer, int offset, int count);
Drawaes commented 7 years ago

I am easy on location, and on naming, the Type Inference is more "friendly" if you want it moved @KrzysztofCwalina what would you call the class if it went to CompilierServices?

public static class Unsafe
{
    public static Span<T> DangerousCreateSpan<T>(T[] buffer, int offset, int count);
    public static ReadonlySpan<T> DangeriousCreateReadonlySpan(T[] buffer, int offset, int count);
    public static Memory<T> DangerousCreateMemory<T>(T[] buffer, int offset, int count);
    public static ReadOnlyMemory<T> DangerousCreateReadonlyMemory<T>(T[] buffer, int offset, int count);
}

Means no new type, and they all show up in the IDE side by side. You also have made it super clear what the story is here?

ahsonkhan commented 6 years ago

Move the entire classes into System.Runtime.InteropServices the argument rename is for interop also :)

@Drawaes, can you please update the original post with the namespace specified?

the Type Inference is more "friendly"

Don't we still get the type inference regardless of whether it lives in System or System.Runtime.InteropServices?

Do we need the Create methods? For the original concern about error matching, isn't DangerousCreate sufficient?

Drawaes commented 6 years ago

Does that look better?

  1. We have the namespace moved to System.Runtime.InteropServices
  2. We get type Inference
  3. Dropped the Create methods, not sure why there were still there as you are correct the DangerousCreate seems to cover all of issues I had.

:shipit:

ahsonkhan commented 6 years ago

Do the argument names in DangerousCreate have to be different than what we use in the Span constructor or can we continue to use index/length?

Drawaes commented 6 years ago

you are correct Sir they should match the Span names, as it doesn't matter what they are called for my purpose now, and .. consistency

I fixed it ( I left the example code as Array, offset to make the point about what they are likely to be ;) )

ahsonkhan commented 6 years ago

Sorry for the mistake. I meant start/length.

https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L46

public Span(T[] array, int start, int length)

Edit: Also, should we also add an overload that only takes an array (which skips the null and co-variance checks)?

Drawaes commented 6 years ago

No problem, I was being lazy and didn't check as well 🤣 (fixed)

benaadams commented 6 years ago

Question would be, should the existing Span overload

[MethodImpl(MethodImplOptions.AggressiveInlining)]
[EditorBrowsable(EditorBrowsableState.Never)]
public static Span<T> DangerousCreate(object obj, ref T objectData, int length)

Also move to System.Runtime.InteropServices? T is specified by ref T objectData

Drawaes commented 6 years ago

Seems reasonable, the only query would be around shipping status and back compat.

benaadams commented 6 years ago

Seems reasonable, the only query would be around shipping status and back compat.

Question for the api reviewers :)

Drawaes commented 6 years ago

Above our pay grades... :lol:

terrajobst commented 6 years ago

Video

We went back and forth, but it seems we're in agreement that these can be useful. We shouldn't expose static types Span and Memory in InteropServices. Instead, we should add them to the new MemoryMarshal type.

Drawaes commented 6 years ago

Started a PR in coreclr https://github.com/dotnet/coreclr/pull/14833

jkotas commented 6 years ago

I do not think these APIs are good addition.

Drawaes commented 6 years ago

The reason mainly is

Most of the stream types run the validation checks already. I couldn't remove these checks when updating to take memory from an array (or span for that matter) because it would change the exception message for the field that fails, making it a breaking change.

That means there are plenty of times that instead of the 5 checks (null, index < length & index > 0 using overflow for 1 check, index + count < length, length > 0) you end up with 10 checks everytime you call.

This might seem like a "slow" operation but often these async methods can return sync and with ValueTask there is no allocations. So say SslStream reads 16k frames, and then the client "sips" 2k from it, then the next call for 2k will be a sync very quick return and the 5 extra checks start to show.

ahsonkhan commented 6 years ago

I do not think these APIs are good addition.

The reason mainly is

Would having these methods as internal only solve both concerns?

jkotas commented 6 years ago

the 5 extra checks start to show

Memory(T[], int, int) has these precondinditions:

            if (array == null)
                ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
            // This one should be completely optimized out for `Memory<byte>`.
            if (default(T) == null && array.GetType() != typeof(T[]))
                ThrowHelper.ThrowArrayTypeMismatchException();
            if ((uint)start > (uint)array.Length || (uint)length > (uint)(array.Length - start))
                ThrowHelper.ThrowArgumentOutOfRangeException();

So I see 3 extra checks - they will be like 7 well predicted instructions (and JIT may optimize some of it out). They should cost nothing compared to other overhead associated with Memory<T>.

Could you please share an example of realworld code that shows the problem?

omariom commented 6 years ago

This is how it looks on .NET Framework x86

jkotas commented 6 years ago

I meant to e.g. have some measurement that shows how much faster a real world async Stream Read method will get if it has this special constructor. (Also, this is not what the code will look like because of the actual code is written using non-inlineable ThrowHelper.)

For better or worse, the framework APIs are designed with explicit argument checks. Yes, these argument checks add extra code everywhere. But the execution cost of these extra argument checks is relatively small. We have done experiments in the past with "fast and crash" modes where we removed things like bounds checking everywhere - the performance benefits of such modes were surprisingly small because of these checks tend to execute pretty fast. I am wondering whether we know what we are really getting out of these duplicated APIs.

I guess I should wait for the link to the API discussion to be posted to see what arguments were used to justify these APIs.

ahsonkhan commented 6 years ago

I guess I should wait for the link to the API discussion to be posted to see what arguments were used to justify these APIs.

https://youtu.be/bHwCPVNQLwo?t=1h16m53s

omariom commented 6 years ago

@jkotas I meant to illustrate that the comparison, as you said, is a few instructions with a couple of well predicted branches.

    mov esi, [ebp+0xc]
    test edx, edx
    jz L001e
    mov eax, [edx+0x4]
    cmp eax, esi
    jb L0049
    sub eax, esi
    cmp eax, [ebp+0x8]
    jb L0049
jkotas commented 6 years ago

I have listened to the discussion.

My take is that if we are worried about redunant checks, we should tech the JIT to eliminate them. The JIT does elimate some of the duplicate checks already, e.g. null checks. For example, the code for this method:

static int FetchElementUsingSpan(int[] x)
{
    if (x == null)
        throw new ArgumentNullException();
    ReadOnlySpan<int> s = new ReadOnlySpan<int>(x);
    return s[0];
}

Is this:

test!My.FetchElementUsingSpan(Int32[]):
00007ff9`ea1108d0 56              push    rsi
00007ff9`ea1108d1 4883ec20        sub     rsp,20h
00007ff9`ea1108d5 4885c9          test    rcx,rcx
00007ff9`ea1108d8 7414            je      test!My.FetchElementUsingSpan(Int32[])+0x1e (00007ff9`ea1108ee)
00007ff9`ea1108da 488d4110        lea     rax,[rcx+10h]
00007ff9`ea1108de 8b4908          mov     ecx,dword ptr [rcx+8]
00007ff9`ea1108e1 83f900          cmp     ecx,0
00007ff9`ea1108e4 762b            jbe     test!My.FetchElementUsingSpan(Int32[])+0x41 (00007ff9`ea110911)
00007ff9`ea1108e6 8b00            mov     eax,dword ptr [rax]
00007ff9`ea1108e8 4883c420        add     rsp,20h
00007ff9`ea1108ec 5e              pop     rsi
00007ff9`ea1108ed c3              ret
...

Notice that there are two null checks: one in my method and other one in the ReadOnlySpan implementation itself. However, the JIT figured out that the second check is redundant and optimized it out. The JIT is not as good in eliminating the redundant checks for bounds checks today. It is something to fix in the JIT and not optimize API design around.

I do not think we should be adding duplicate APIs that differ just in having vs. not having argument validation. Having two ways to do a thing and making people to think about it is negative value for the framework. As always, I can be convinced that this make sense by data.

Drawaes commented 6 years ago

If the nulls are already removed and there is scope to look at either removal via the JIT then I would be good with removing the API review. It's been through the review process though so others would need to make the final choice.

How about this, I will do some measurements in an end to end scenario, if the measurements are in the margin of error we could remove It? If they aren't then there is something more to discuss.

jkotas commented 6 years ago

@Drawaes Sounds good. Thanks!