edgedb / edgedb-net

The official .NET client library for EdgeDB
https://edgedb.com
Apache License 2.0
82 stars 9 forks source link

Fix buffer overrun and multi link properties #36

Closed quinchs closed 1 year ago

quinchs commented 1 year ago

Summary

This PR fixes an issue with multi link property deserialization as well as a obscure buffer overrun bug with sets of strings.

Closes #35

Multi link property issue

When a dynamic object is passed into one of the Query* methods, the steps for building the result differ from when a generic is passed in. One of these steps - specifically the part that initializes object codecs - has different logic for generics vs non generic type information. The issue arises from a check for templated codecs (also known as CompilableWrappingCodecs in the code) that determines the property type within an objects property.

This check determines whether the property's type is derived from the templates inner type (templates contain a wrapping type ex: List<>, array, Set<>, etc; and an inner type ex: str, int, etc.) or to extract the type argument from a generic like List<string>, Range<int64>, etc. This check should only be performed when no type information is present for the property, but the code didn't take into account the property information.

https://github.com/edgedb/edgedb-net/blob/1be8eaf73dc4d806aee317c2c37df909cfdb8486/src/EdgeDB.Net.Driver/Binary/Codecs/Visitors/TypeVisitor.cs#L70-L76

The fix was to only do the check if the property info isn't found, implemented as shown:

https://github.com/edgedb/edgedb-net/blob/35bb296211751a42ce9c69ae87ff104494a57f98/src/EdgeDB.Net.Driver/Binary/Codecs/Visitors/TypeVisitor.cs#L75-L83

Buffer overrun with sets of strings

This issue comes from how EdgeDB.Net reads the values defined in the EdgeDB Binary Protocol and turns them into their .NET form. Let's look at how sets are deserialized as well as strings.

In the binary protocol, EdgeDB encodes sets as the following structure

struct SetOrArrayValue {
    // Number of dimensions, currently must
    // always be 0 or 1. 0 indicates an empty set or array.
    int32       ndims;

    // Reserved.
    int32       reserved0;

    // Reserved.
    int32       reserved1;

    // Dimension data.
    Dimension   dimensions[ndims];

    // Element data, the number of elements
    // in this array is the sum of dimension sizes:
    // sum((d.upper - d.lower + 1) for d in dimensions)
    Element     elements[];
};

struct Dimension {
    // Upper dimension bound, inclusive,
    // number of elements in the dimension
    // relative to the lower bound.
    int32       upper;

    // Lower dimension bound, always 1.
    int32       lower;
};

struct Element {
    // Encoded element data length in bytes.
    int32       length;

    // Element data.
    uint8       data[length];
};

In the binding, we read the set as the following:

https://github.com/edgedb/edgedb-net/blob/2248ae0a41199280ea97584e7ae8c4482c500d8d/src/EdgeDB.Net.Driver/Binary/Codecs/SetCodec.cs#L68-L102

Lets now look at how strings are decoded: https://github.com/edgedb/edgedb-net/blob/2248ae0a41199280ea97584e7ae8c4482c500d8d/src/EdgeDB.Net.Driver/Binary/Codecs/TextCodec.cs#L9-L12

The ConsumeString method reads all remaining data in the underlying buffer as UTF8 encoded text: https://github.com/edgedb/edgedb-net/blob/2248ae0a41199280ea97584e7ae8c4482c500d8d/src/EdgeDB.Net.Driver/Binary/PacketReader.cs#L86-L89

The issue is how there's no mutation of the reader when we pass it to the inner codec of the set codec, it's simply consumed and spread across all set elements. To fix this, the reader was given a configurable limit property, which controls how many bytes can be read from the reader. This is used in the set codec as shown:

https://github.com/edgedb/edgedb-net/blob/35bb296211751a42ce9c69ae87ff104494a57f98/src/EdgeDB.Net.Driver/Binary/Codecs/SetCodec.cs#L92-L104

With this, additional checks were added to PacketReader which prevent performing reads when exceeding the set limit of the reader.