This PR fixes an issue with multi link property deserialization as well as a obscure buffer overrun bug with sets of strings.
Closes #35
Multi link property issue
When a dynamic object is passed into one of the Query* methods, the steps for building the result differ from when a generic is passed in. One of these steps - specifically the part that initializes object codecs - has different logic for generics vs non generic type information. The issue arises from a check for templated codecs (also known as CompilableWrappingCodecs in the code) that determines the property type within an objects property.
This check determines whether the property's type is derived from the templates inner type (templates contain a wrapping type ex: List<>, array, Set<>, etc; and an inner type ex: str, int, etc.) or to extract the type argument from a generic like List<string>, Range<int64>, etc. This check should only be performed when no type information is present for the property, but the code didn't take into account the property information.
This issue comes from how EdgeDB.Net reads the values defined in the EdgeDB Binary Protocol and turns them into their .NET form. Let's look at how sets are deserialized as well as strings.
In the binary protocol, EdgeDB encodes sets as the following structure
struct SetOrArrayValue {
// Number of dimensions, currently must
// always be 0 or 1. 0 indicates an empty set or array.
int32 ndims;
// Reserved.
int32 reserved0;
// Reserved.
int32 reserved1;
// Dimension data.
Dimension dimensions[ndims];
// Element data, the number of elements
// in this array is the sum of dimension sizes:
// sum((d.upper - d.lower + 1) for d in dimensions)
Element elements[];
};
struct Dimension {
// Upper dimension bound, inclusive,
// number of elements in the dimension
// relative to the lower bound.
int32 upper;
// Lower dimension bound, always 1.
int32 lower;
};
struct Element {
// Encoded element data length in bytes.
int32 length;
// Element data.
uint8 data[length];
};
The issue is how there's no mutation of the reader when we pass it to the inner codec of the set codec, it's simply consumed and spread across all set elements. To fix this, the reader was given a configurable limit property, which controls how many bytes can be read from the reader. This is used in the set codec as shown:
Summary
This PR fixes an issue with multi link property deserialization as well as a obscure buffer overrun bug with sets of strings.
Closes #35
Multi link property issue
When a dynamic object is passed into one of the
Query*
methods, the steps for building the result differ from when a generic is passed in. One of these steps - specifically the part that initializes object codecs - has different logic for generics vs non generic type information. The issue arises from a check for templated codecs (also known asCompilableWrappingCodec
s in the code) that determines the property type within an objects property.This check determines whether the property's type is derived from the templates inner type (templates contain a wrapping type ex:
List<>
,array
,Set<>
, etc; and an inner type ex:str
,int
, etc.) or to extract the type argument from a generic likeList<string>
,Range<int64>
, etc. This check should only be performed when no type information is present for the property, but the code didn't take into account the property information.https://github.com/edgedb/edgedb-net/blob/1be8eaf73dc4d806aee317c2c37df909cfdb8486/src/EdgeDB.Net.Driver/Binary/Codecs/Visitors/TypeVisitor.cs#L70-L76
The fix was to only do the check if the property info isn't found, implemented as shown:
https://github.com/edgedb/edgedb-net/blob/35bb296211751a42ce9c69ae87ff104494a57f98/src/EdgeDB.Net.Driver/Binary/Codecs/Visitors/TypeVisitor.cs#L75-L83
Buffer overrun with sets of strings
This issue comes from how EdgeDB.Net reads the values defined in the EdgeDB Binary Protocol and turns them into their .NET form. Let's look at how sets are deserialized as well as strings.
In the binary protocol, EdgeDB encodes sets as the following structure
In the binding, we read the set as the following:
https://github.com/edgedb/edgedb-net/blob/2248ae0a41199280ea97584e7ae8c4482c500d8d/src/EdgeDB.Net.Driver/Binary/Codecs/SetCodec.cs#L68-L102
Lets now look at how strings are decoded: https://github.com/edgedb/edgedb-net/blob/2248ae0a41199280ea97584e7ae8c4482c500d8d/src/EdgeDB.Net.Driver/Binary/Codecs/TextCodec.cs#L9-L12
The
ConsumeString
method reads all remaining data in the underlying buffer as UTF8 encoded text: https://github.com/edgedb/edgedb-net/blob/2248ae0a41199280ea97584e7ae8c4482c500d8d/src/EdgeDB.Net.Driver/Binary/PacketReader.cs#L86-L89The issue is how there's no mutation of the reader when we pass it to the inner codec of the set codec, it's simply consumed and spread across all set elements. To fix this, the reader was given a configurable limit property, which controls how many bytes can be read from the reader. This is used in the set codec as shown:
https://github.com/edgedb/edgedb-net/blob/35bb296211751a42ce9c69ae87ff104494a57f98/src/EdgeDB.Net.Driver/Binary/Codecs/SetCodec.cs#L92-L104
With this, additional checks were added to
PacketReader
which prevent performing reads when exceeding the set limit of the reader.