dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

[API Proposal]: Provide control over growth rate of AsnWriter #102031

Open edwardneal opened 4 months ago

edwardneal commented 4 months ago

Background and motivation

We can use AsnWriter to generate BER, CER & DER requests, and #69573 allowed the initial capacity of the buffer to be specified. If the initial capacity is exceeded, the buffer is expanded in blocks of 1KB within AsnWriter.EnsureWriteCapacity.

In some scenarios (such as the generation of SNMP requests/responses) a 1KB rate of expansion can be overkill. I'd like to be able to control this - either by explicitly setting the growth rate, or by changing the growth rate to a factor of the initial size.

API Proposal

namespace System.Formats.Asn1;

public class AsnWriter
{
    public AsnWriter(AsnEncodingRules ruleSet);
    public AsnWriter(AsnEncodingRules ruleSet, int initialCapacity);
+   // growthRate is measured in bytes.
+   public AsnWriter(AsnEncodingRules ruleSet, int initialCapacity, int growthRate);
}

API Usage

string[] oids = ["1.3.6.1.2.1.1.4.0", "1.3.6.1.2.1.1.5.0" /*, ... */];
var writer = new AsnWriter(AsnEncodingRules.BER, initialSize: 64, growthRate: 32);
using (var snmpSequence = writer.PushSequence())
{
    writer.WriteInteger(1);
    writer.WriteOctetString(System.Text.Encoding.UTF8.GetBytes("public"));

    using (var pduSequence = writer.PushSequence(new Asn1Tag(TagClass.ContextSpecific, 0)))
    {
        writer.WriteInteger(1);
        writer.WriteInteger(0);
        writer.WriteInteger(0);

        using (var varBindSequence = writer.PushSequence())
        {
            foreach (var oid in oids)
            {
                using (var oidSequence = writer.PushSequence())
                {
                    writer.WriteObjectIdentifier(oid);
                    writer.WriteNull();
                }
            }
        }
    }
}

var encodedPdu = writer.Encode();

Alternative Designs

Users of the library could calculate the expected size of the resultant payload and set the AsnWriter's initial size. In the example of an SNMP GET, this is fairly trivial to do. It means that we've got a slightly leakier abstraction (because callers now need to concern themselves with the details of ASN.1 encoding) but perhaps this library is low-level enough that callers should reasonably be expected to be able to understand those details anyway.

The existing AsnWriter(AsnEncodingRules, int) constructor could also be changed to set the growth rate to a factor of the initial capacity. This could be a reasonable thing to do at first glance, but I don't use the library in enough scenarios to suggest a factor - 50% might be a generic starting point, but the behaviour seems fairly brittle between use cases if it's the sole change made.

I'd appreciate a sanity-check on the name of the growthRate parameter. I avoided blockSize so that callers don't need to think about what a block entails with respect to AsnWriter. I've got some reservations about calling it growthRate because it sounds a little like a growth factor though.

Risks

No response

dotnet-policy-service[bot] commented 4 months ago

Tagging subscribers to this area: @dotnet/area-system-formats-asn1, @bartonjs, @vcsjones See info in area-owners.md if you want to be subscribed.

vcsjones commented 4 months ago

I wonder if just giving total control over allocating the underlying buffer would help with most scenarios where advanced scenarios would make it easier to give control over the growth strategy. A strawperson idea:

public class AsnWriter {
    public AsnWriter(
        AsnEncodingRules ruleSet,
        Func<int, int, byte[]> allocator,
        Action<byte[]> deallocator = null);
}

And the usage would be something like this:

static byte[] Allocator(int currentSize, int minimumSize) {
    if (currentSize == 0) {
        return Math.Max(minimumSize * 2, 1024);
    }

    return new byte[minimumSize * 2]; // Doubling strategy
}

AsnWriter writer = new(AsnEncodingRules.DER, Allocator);

The Deallocator could be used for people that need to do something with the byte[] after the AsnWriter has copied from the old buffer to the new buffer. This could be clearing the array, or allow the allocator and deallocator to use some memory pooling strategy instead of creating a new byte array every time.

The upshot to this is it gives you total control over the allocation and growth strategy. The downside is that it is comes with some amount of risk. The Allocator would need to be documented that AsnWriter owns the byte array it returns, until it is given back in the Deallocator. Having the allocator may promote misuse, such as using it to get the encoded data instead of using the Encode API.


Another possible alternative is to implement an AsnEncoder static class that encodes ASN.1 primitives. That way someone could implement their own AsnWriter from scratch and have even more control. This is even more risky in my opinion, but the risk is obvious, whereas with an allocator pattern the risk is more subtle.

edwardneal commented 4 months ago

I see your point, but think that it might mix concerns a little. I could see the buffer being allocated from a few different places:

The third point is probably the most important one for cases where we're encoding private keys, but being able to write directly into the stackalloc'd Span would be ideal (if it becomes possible at some stage.)

Separately to the buffer's allocator, there's value in controlling the allocation size. I think this would be overkill for the general case, but could understand why somebody might want more granular control of the growth rate to avoid continuously allocating new arrays - perhaps growing by 0.2 * initialCount or 32 bytes for the first X instances, then 0.5 * initialCount or 64 bytes for the next Y instances, then giving up and allocating in Math.Max(initialCount, 1024) blocks.

To doodle an idea, we could have something similar to:

public sealed class AsnWriter
{
    AsnWriter(AsnEncodingRules ruleSet,
        AsnWriter.BufferAllocationOptions bufferAllocationOptions);

    public readonly struct BufferAllocationOptions
    {
        // null if unpooled.
        public ArrayPool<byte>? SourcePool { get; }

        public bool ClearBeforeDeallocation { get; }

        // (initialSize, currentSize, requestedBytes)
        public Func<int, int, int, int> DefaultAllocationSizeController { get; }

        // Pooled with ArrayPool<byte>.Shared. Array not cleared after return.
        public static BufferAllocationOptions Default { get; }

        // Uses a separate ArrayPool<byte> and clears array after return.
        public static BufferAllocationOptions Confidential { get; }

        // Returns a new byte array every time, and performs a simple Array.Resize.
        public static BufferAllocationOptions Unpooled { get; }

        public static Func<int, int, int, int> ExpandBufferByFixedSize(int blockSize = 1024);
    }
}
PaulusParssinen commented 4 months ago

Another possible alternative is to implement an AsnEncoder static class that encodes ASN.1 primitives. That way someone could implement their own AsnWriter from scratch and have even more control. This is even more risky in my opinion, but the risk is obvious, whereas with an allocator pattern the risk is more subtle.

+1 for static encoder API to mirror the existing AsnDecoder. This would allow much more control and seems to have already be considered as future improvement (by quick search https://github.com/dotnet/runtime/issues/69573#issuecomment-1133184185)

I would allow the parts of the existing stateful writer to be implemented on top of it and it would have probably been useful in this DirectoryControl PR's encoding logic.

And I have branchless BER length calculation tricks which would work well with static APIs 😋

edwardneal commented 4 months ago

When I wrote that particular PR my first intuition was to reach for AsnDecoder and AsnEncoder classes - I ideally wanted to avoid allocating anything other than the resultant byte array, and allocating an instance of AsnWriter wasn't ideal.

How would we model the encoding of SEQUENCEs and SETs? The tag's length isn't currently written until the contents have been encoded (unless we take a leaf out of WinLDAP's book and hardcode the length field's length to 4-5 bytes.) When I try to model it in a way which is useful, I create a ref struct which functions a little like AsnWriter.Scope and implement WriteX methods on it. At that stage, it feels more like a slim version of AsnWriter than a static class though, similar to the below.

public ref struct Asn1Scope
{
    public Asn1Scope(Span<byte> backingSpan);
    public Asn1Scope(byte[] backingArray, bool allowResize);
    public Asn1Scope(ArrayPool<byte> backingPool);

    public readonly bool CanResizeBackingStore { get; }
    public readonly int OpenChildScopes { get; }

    public Asn1Scope OpenSequence(Asn1Tag? tag);

    public void WriteInteger(int value, Asn1Tag? tag = null);

    public void Dispose();

    public readonly ReadOnlySpan<byte> EncodedValue { get; }
}

Span<byte> backingSpan = stackalloc byte[8];
Asn1Scope rootScope = new(backingSpan); // Passed a Span<byte>, throws exception if it'd need to expand.

using (Asn1Scope sequence = rootScope.OpenSequence(tag: null))
{
    Debug.Assert(rootScope.OpenChildScopes == 1);
    sequence.WriteInteger(0xFFFF_FFFF);
} // Parent's OpenChildScopes is decremented upon child disposal

Debug.Assert(rootScope.OpenChildScopes == 0);
ReadOnlySpan<byte> results = rootScope.EncodedValue; // Throws exception if OpenChildScopes > 0