AdrianStrugala / AvroConvert

Rapid Avro serializer for C# .NET
Other
102 stars 27 forks source link

Refactor to improve performance when deserializing #111

Closed jacksunny2020 closed 1 year ago

jacksunny2020 commented 1 year ago

Hi Adrian,

Here is the finding about the performance issue in the project using AvroConvert, and I fork the project and made changes locally, all the unit tests passed before and after this change. Would you please help to review the change to estimate the change is acceptable to this aim while didn't break existing behaviors? Thank you.

Bottleneck is method Resolve in our project. AvroConvertPerformanceBottleneck The suggestion of refactor based on analysis on the source code. AvroConvertPerformanceBottleneckSuggestChange Here is the unit test result after this change. TestResultAfterRefactor

Best Regards, Jack

AdrianStrugala commented 1 year ago

Hello Jack,

Thank you for the PR! Unfortunately, it contains a breaking change. Removing FindBranch() invocation in case of mismatching Union schemas would make common scenarios fail. Ex: image

The Resolve from your screenshot is the main class used for deserialization. It calls all of the internal Resolvers, that's why it's taking nearly all the deserialization time. Notice the long tree below, it calls methods used for objects and properties resolution.

The FindBranch() under the if statement is taking a negligible amount of time, compared to the whole deserialization process. When you are using the same model as for serialization, it's nearly 0.

The most promising fields of improvements are the most used places. In your case, it would be ResolveRecord(), ResolveArray(), and ResolveDictionary() methods. You might want to take a look at them for possible time gain.

Thank you for your input! Adrian