dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.18k stars 4.72k forks source link

Proposal/Question: IMemoryOwner decorator & lifetime management #27869

Open grant-d opened 5 years ago

grant-d commented 5 years ago

[Edited - reworded for brevity] This is a question/proposal about IMemoryOwner chain-of-ownership. The pattern may find utility in current api patterns such as System.Text.Json.

TLDR;

When renting from MemoryPool, the returned IMemoryOwner may be passed on through various owners. An intermediate owner may populate the buffer and the final owner may then wish to inspect & work with that Length (say 85), not the pool-allocated Length (say 128).

The proposed extension method would permit the following pattern. Contrast the caller allocating an owned buffer, retrieving its actual-length and then slicing it client-side. But not sure if it's a good law-abiding citizen in the land of memory ownership?

using IMemoryOwner<byte> buffer = Serialize(foo);
var actualLength = buffer.Memory.Length; // Buffer- and filled-length are one and the same

Background

Span<T> has made common the idiom of the caller allocating a buffer then passing it to a method to fill, along with a bool or exception result to indicate success. For example, in a recent json api review:

We shouldn't return Span; we should let the caller pass in the buffer to fill. This likely also requires out parameters to indicate how many bytes were written.

var buffer = new Span<byte>(…);
if (TrySerialize(foo, buffer, out int length)) { … }`

However, sometimes it's simpler or preferred for the callee to allocate the buffer itself, and return a IMemoryOwner. In that case, however, the buffer may be oversized, so the signature typically needs to hand back an actualLength too. Ostensibly:

using IMemoryOwner<byte> buffer = Serialize(foo, out int actualLength);
var x = owner.Memory.Length; // Callsite typically doesn't care about this value
var y = actualLength; // Gold

This pattern is somewhat leaky; the callsite is aware of the two lengths but should typically use only one.

Proposal

The following api would seem to simplify the latter use-case, but I'm not sure if it respects the chain-of-memory-ownership correctly. Here's the same callsite using the new api.

using IMemoryOwner<byte> buffer = Serialize(foo);
var actualLength = buffer.Memory.Length; // Buffer- and filled-length are one and the same

The Serialize method would have internally allocated an owned buffer itself, filled it and then sliced it using the extension below, ultimately handing back a exact-sized IMemoryOwner to the caller:

    public static class IMemoryOwnerExtensions
    {
        public static IMemoryOwner<T> WrapSlice<T>(this IMemoryOwner<T> owner, int start, int length)
            => new SliceOwner<T>(owner, start, length); // Guard clauses elided

        public static IMemoryOwner<T> WrapSlice<T>(this IMemoryOwner<T> owner, int start)
            => new SliceOwner<T>(owner, start);

        // This is the meat of it
        private struct SliceOwner<T> : IMemoryOwner<T> // or class, whatever
        {
            private IMemoryOwner<T> _owner;
            public Memory<T> Memory { get; private set; }

            public SliceOwner(IMemoryOwner<T> owner, int start, int length)
            {
                _owner = owner;
                Memory = _owner.Memory.Slice(start, length);
            }

            public SliceOwner(IMemoryOwner<T> owner, int start)
            {
                _owner = owner;
                Memory = _owner.Memory.Slice(start);
            }

            public void Dispose()
            {
                if (_owner != null)
                {
                    _owner.Dispose();
                    _owner = null;
                }

                Memory = default;
            }
        }
    }

Notes

While such an API might be useful (assuming it has the correct semantics), it may be abused by unwary users. For example they may try to simultaneously take two disparate slices of the same original owner (eg Slice(owner1, 0, 10), Slice(owner1, 10, 20)), then be surprised why one of them throws an object-disposed exception.

cc @GrabYourPitchforks

grant-d commented 5 years ago

Another alternative that I implemented, is to have a custom MemoryPool that internally tracks instances in a private readonly List<IMemoryOwner>. When the pool is disposed, it disposes the members in the list too. So something like an arena allocator. The Rent method then returns the sliced Memory directly, not the IMemoryOwner. But this seems like a lot of ceremony for a simple problem.

grant-d commented 5 years ago

Updated the question to be more concise

schungx commented 5 years ago

I second this proposal!

Currently, when using MemoryPool<T>, it is frustrating to always having to return a tuple containing an IMemoryOwner<T> plus a length of the data length initially requested (as the IMemoryOwner<T>.Memory returned may be larger than the requested length and so contain garbage at the end).

Moreover IMemoryOwner<T> cannot be sliced, thus always requiring the actual length of the data inside the buffer to be carried around externally.

Actually, IMemoryOwner<T> can be made to carry an additional parameter, Used, which is initially the length requested (instead of the size of the buffer in Memory).

IMemoryOwner<T>.Used can then be sliced or modified when the buffer is filled up. Then it is concise and easy to pass around.

This new IMemoryOwner<T> can then be used anywhere that takes a Memory<T> -- simply do IMemoryOwner<T>.Memory.Slice(0, IMemoryOwner<T>.Length). A new property that returns this sliced Memory<T> will even be better.

grant-d commented 5 years ago

@schungx here's the helper I am using, it introduces a Slice method as well as a RentExact variation of Rent.

Per my thread above though, I am not sure it respects the chain of ownership properly. I believe it does, but we'd need confirmation from the experts.

    public static class IMemoryOwnerExtensions
    {
        /// <summary>
        /// Rent a buffer from a pool with an exact length.
        /// </summary>
        /// <param name="pool">The <see cref="MemoryPool{T}"/> instance.</param>
        /// <param name="exactBufferSize">The exact size of the buffer.</param>
        public static IMemoryOwner<T> RentExact<T>(this MemoryPool<T> pool, int exactBufferSize)
        {
            if (pool == null) throw new ArgumentNullException(nameof(pool));

            IMemoryOwner<T> rented = pool.Rent(exactBufferSize);

            if (exactBufferSize == rented.Memory.Length)
                return rented;

            return new SliceOwner<T>(rented, 0, exactBufferSize);
        }

        /// <summary>
        /// Wrap an existing <see cref="IMemoryOwner{T}"/> instance in a lightweight manner, but allow
        /// the <see cref="IMemoryOwner{T}.Memory"/> member to have a different length.
        /// </summary>
        /// <param name="owner">The original instance.</param>
        /// <param name="start">The starting offset of the slice.</param>
        /// <param name="length">The length of the slice.</param>
        public static IMemoryOwner<T> Slice<T>(this IMemoryOwner<T> owner, int start, int length)
        {
            if (owner == null) throw new ArgumentNullException(nameof(owner));

            if (start == 0 && length == owner.Memory.Length)
                return owner;

            if ((uint)start >= (uint)owner.Memory.Length) throw new ArgumentOutOfRangeException(nameof(start));
            if ((uint)length > (uint)(owner.Memory.Length - start)) throw new ArgumentOutOfRangeException(nameof(length));

            return new SliceOwner<T>(owner, start, length);
        }

        /// <summary>
        /// Wrap an existing <see cref="IMemoryOwner{T}"/> instance in a lightweight manner, but allow
        /// the <see cref="IMemoryOwner{T}.Memory"/> member to have a different length.
        /// </summary>
        /// <param name="owner">The original instance.</param>
        /// <param name="start">The starting offset of the slice.</param>
        public static IMemoryOwner<T> Slice<T>(this IMemoryOwner<T> owner, int start)
        {
            if (owner == null) throw new ArgumentNullException(nameof(owner));

            if (start == 0)
                return owner;

            if ((uint)start >= (uint)owner.Memory.Length) throw new ArgumentOutOfRangeException(nameof(start));

            return new SliceOwner<T>(owner, start);
        }

        private sealed class SliceOwner<T> : IMemoryOwner<T>
        {
            private IMemoryOwner<T> _owner;
            public Memory<T> Memory { get; private set; }

            public SliceOwner(IMemoryOwner<T> owner, int start, int length)
            {
                _owner = owner;
                Memory = _owner.Memory.Slice(start, length);
            }

            public SliceOwner(IMemoryOwner<T> owner, int start)
            {
                _owner = owner;
                Memory = _owner.Memory.Slice(start);
            }

            public void Dispose()
            {
                if (_owner != null)
                {
                    _owner.Dispose();
                    _owner = null;
                }

                Memory = default;
            }
        }
    }
schungx commented 5 years ago

Well, I ended up not using IMemoryOwner<T> at all but roll my own fake implementations of Memory<T> based on byte[] returned from ArrayPool<byte>.Shared.

The reason I do that is that I need access to the underlying byte[] buffer for the whole mass of .NET Framework API's that have not yet converted to Span's.

schungx commented 5 years ago
namespace System
{
    public struct PooledMemory<T> : IDisposable
    {
        private T[] m_Data;
        private int m_Length;

        public int Length => m_Length;

        public PooledMemory (int length)
        {
            m_Data = (length <= 0) ? null : ArrayPool<T>.Shared.Rent(length);
            m_Length = 0;
            SetLength(length);
        }

        internal PooledMemory (T[] data, int length)
        {
            m_Data = data ?? throw new ArgumentNullException(nameof(data));
            m_Length = 0;
            SetLength(length);
        }

        public void SetLength (int length)
        {
            if (length < 0) throw new ArgumentOutOfRangeException(nameof(Length));

            if (m_Data == null) {
                if (length > 0) throw new ArgumentOutOfRangeException(nameof(Length), "Buffer not long enough.");
            } else {
                if (length > m_Data.Length) throw new ArgumentOutOfRangeException(nameof(Length), "Buffer not long enough.");
            }

            m_Length = length;
        }

        public T[] GetBuffer () => m_Data ?? throw new ArgumentNullException(nameof(Data));

        public Span<T> Span => m_Data.AsSpan(0, Length);
        public bool IsEmpty => m_Data == null || Length <= 0;

        public void CopyTo (Span<T> buf) => m_Data.AsSpan(0, Length).CopyTo(buf);
        public void CopyTo (Memory<T> buf) => m_Data.AsMemory(0, Length).CopyTo(buf);

        public void Dispose ()
        {
            if (m_Data != null && m_Data.Length > 0) ArrayPool<T>.Shared.Return(m_Data);
            m_Data = null;
        }

        public static readonly PooledMemory<T> Empty = new PooledMemory<T>();

        public static implicit operator Memory<T>(PooledMemory<T> d) => d.m_Data.AsMemory(0, d.Length);
        public static implicit operator ReadOnlyMemory<T>(PooledMemory<T> d) => d.m_Data.AsMemory(0, d.Length);
        public static implicit operator ReadOnlyPooledMemory<T>(PooledMemory<T> d)
            => new ReadOnlyPooledMemory<T>(d.m_Data, d.Length);
    }

    public struct ReadOnlyPooledMemory<T> : IDisposable
    {
        private T[] m_Data;

        public int Length { get; }

        internal ReadOnlyPooledMemory (T[] data, int length)
        {
            m_Data = data ?? throw new ArgumentNullException(nameof(data));
            Length = (length <= data.Length) ? length : throw new ArgumentOutOfRangeException(nameof(length), "Buffer not long enough.");
        }

        public void Dispose ()
        {
            if (m_Data != null && m_Data.Length > 0) ArrayPool<T>.Shared.Return(m_Data);
            m_Data = null;
        }

        public T[] GetBuffer () => m_Data ?? throw new ArgumentNullException(nameof(Data));

        public ReadOnlySpan<T> Span => m_Data.AsSpan(0, Length);
        public bool IsEmpty => m_Data == null || Length <= 0;

        public void CopyTo (Span<T> buf) => m_Data.AsSpan(0, Length).CopyTo(buf);
        public void CopyTo (Memory<T> buf) => m_Data.AsMemory(0, Length).CopyTo(buf);

        public static readonly ReadOnlyPooledMemory<T> Empty = new ReadOnlyPooledMemory<T>();

        public static implicit operator ReadOnlyMemory<T>(ReadOnlyPooledMemory<T> d) => d.m_Data.AsMemory(0, d.Length);
    }
}
schungx commented 5 years ago

Usage:

using (PooledMemory<byte> mem = new PooledMemory<byte>(100)) {
    int len = Encoding.UTF8.GetBytes(text, 0, text.Length, mem.GetBuffer(), 0);
    mem.SetLength(len);
    // ... do some processing with mem.Memory, call Span-aware API's...
}

It is designed to be easily refactored into using regular Memory<T> when the API's eventually catch up (whenever that will be...), which should look like the following:

Span<byte> mem = stackalloc byte[100];
// Call new Span-aware Encoding.GetBytes
int len = Encoding.UTF8.GetBytes(text, 0, text.Length, mem, 0);
mem = mem.Slice(len);
// ... do some processing with mem, call Span-aware API's...