elodina / go-avro

Apache Avro for Golang
http://elodina.github.io/go-avro/
Apache License 2.0
129 stars 55 forks source link

Optimize findField operation heavily using a type map #45

Closed crast closed 9 years ago

crast commented 9 years ago

The existing findField operation was called for every element in the schema; and iterated every field in the struct; effectively it does m*n operations where m and n are typically close in number; or in other words O(n^2).

Optimize findField by caching the field information the first time we see a new reflect.Type. Not only does this make the operation run in O(n) time, there's less string comparisons too now.

I made a benchmark branch where I left both functions side-by-side and swapped them out for each other so that I could test various permutations and the growth rate.

The results were as follows (note this is benchmarking an entire SpecificDatumReader.Read, not the findField operation itself, just to show how much findField was overwhelming the decoder time):

BenchmarkSpecificDatumReader_complex_orig-4           100000         12266 ns/op
BenchmarkSpecificDatumReader_complex_newFindField-4   500000          3865 ns/op
BenchmarkSpecificDatumReader_hugeval_orig-4            50000         23620 ns/op
BenchmarkSpecificDatumReader_hugeval_newFindField-4   500000          3736 ns/op

Something to note - even though hugeval is a much bigger struct than complex, the newFindField variant ran in roughly the same time as 'complex'; because the time is now linear to the size of the schema, not really much related to the size of the target struct.

Also, I was able to confirm in another test where I added fields one at a time that the growth is indeed exponential on the original function. Even with small structs; this results in around a 4x speedup.

This is actually only step 1 of optimizations I would like to do; storing the interpretation of a struct paves the way for:

serejja commented 9 years ago

Hi @crast, thanks for submitting this PR! Let me know if you need to discuss some further optimizations or need some help with them please. I'll be willing to merge PRs like that. Thanks!