Open wdcossey opened 9 months ago
Thank you for your contribution, I will do my best to review it today. Looks pretty nice at first glance!
Thank you for your contribution, I will do my best to review it today. Looks pretty nice at first glance!
Pushed a bug-fix and some enhancements.
Very nice PR, thank you. Just two minor comments from my side. When you address them, I will merge the PR, write a short doc, and create the next release.
@AdrianStrugala @wdcossey what's the decision with this PR ? In general it looks like a deviation from an AvroConvert
API, which, in my impression, follows Newtonsoft.Json.JsonConvert
API approach, e.g. JsonConvert.SerializeObject
takes care of serializing all object types and there are no specialized methods for specific types. It would also mean all existing clients would need to make code changes to benefit from this. We could change Serialize
method to apply this approach when passed object is a collection, e.g.
/// <summary>
/// Serializes given object into Avro format (including header with metadata)
/// Choosing <paramref name="codecType"/> reduces output object size
/// </summary>
public static byte[] Serialize(object obj, CodecType codecType)
{
var schema = Schema.Create(obj);
if (schema is ArraySchema && !obj.GetType().IsDictionary())
{
var enumerator = ((IEnumerable)obj).GetEnumerator();
enumerator.MoveNext();
var first = enumerator.Current;
var itemSchema = Schema.Create(first);
enumerator.Reset();
using (MemoryStream resultStream = new MemoryStream())
{
using (var writer = new Encoder(itemSchema, resultStream, codecType))
{
while (enumerator.MoveNext())
{
var item = enumerator.Current;
writer.Append(item);
}
}
byte[] result = resultStream.ToArray();
return result;
}
}
else
{
using (MemoryStream resultStream = new MemoryStream())
{
using (var writer = new Encoder(schema, resultStream, codecType))
{
writer.Append(obj);
}
byte[] result = resultStream.ToArray();
return result;
}
}
}
From the other side, this is potentially a breaking change. While AvroConvert.Deserialize
can successfully deserialize .avro
files generated this way, the byte content of files (generated before/after this change) are not the same.
I would suggest to make a decision and implement this change in the library as there are big perf improvements
UserCount | Original Mean (ms) | Improved Mean (ms) | Mean Improvement (%) | Original Allocated (MB) | Improved Allocated (MB) | Allocation Improvement (%) |
---|---|---|---|---|---|---|
100 | 2.932 | 0.9109 | 68.9% | 2.15 | 1.57 | 27.0% |
1000 | 12.314 | 7.8920 | 35.9% | 19.46 | 12.79 | 34.3% |
10000 | 123.433 | 103.5033 | 16.1% | 217.91 | 151.68 | 30.4% |
Benchmark used to compare nuget AvroConvert v3.4.0
vs AvroConvert.Serialize with the support to serialize array items into separate blocks
[MemoryDiagnoser]
public class AvroConvertSerializeArray
{
[Params(100, 1_000, 10_000)]
public int UserCount;
private User[] _data;
[GlobalSetup]
public void Setup()
{
Fixture fixture = new Fixture();
_data = fixture
.Build<User>()
.With(u => u.Offerings, fixture.CreateMany<Offering>(21).ToList)
.CreateMany(UserCount)
.ToArray();
}
[Benchmark]
public byte[] Serialize() => AvroConvert.Serialize(_data);
}
Hey, I am going to implement this in a similar way that you've suggested Manvel. The point is, that this is in fact a breaking change and I would make it part of the v4 release. Adrian
Avro Container
AvroConvert.DeserializeContainer()
for deserialization (of IEnumerable<>).AvroConvert.SerializeContainer()
for serialization (of IEnumerable<>).IsEnumerable()
toType
extensions.Resolver.ResolveArray()
*** I haven't had time to test everything as I only needed serialization for Azure Data Explorer