Implement prepared schemas for faster decoding

crast commented 8 years ago

Preparing a schema returns an immutable schema so that we can implement speedups conferred by knowing the value will not change.

Presently, only decoding speedups are implemented but this leaves plenty of room for far more speedups in the future.

This is the first step of what I was talking about in this comment

Decoding speedups are primarily achieved by further memoizing the struct field lookup stuff (and enums, too) in a lock-free manner. The very first decode to a given struct type takes double the time because it's analyzing it, but repeated decoding yields noticeable speedups: 33% on a single-threaded benchmark and an even more noticeable 40% on a multi-threaded bench. (x86-64, darwin)

single-thread:
BenchmarkSpecificDatumReader_complex                  500000          3008 ns/op         304 B/op         12 allocs/op
BenchmarkSpecificDatumReader_complex_prepared        1000000          1989 ns/op         304 B/op         12 allocs/op
BenchmarkSpecificDatumReader_hugeval                  500000          3173 ns/op         304 B/op         12 allocs/op
BenchmarkSpecificDatumReader_hugeval_prepared        1000000          2055 ns/op         304 B/op         12 allocs/op

parallel:

BenchmarkSpecificDatumReader_complex-4               1000000          1807 ns/op         304 B/op         12 allocs/op
BenchmarkSpecificDatumReader_complex_prepared-4      1000000          1067 ns/op         304 B/op         12 allocs/op
BenchmarkSpecificDatumReader_hugeval-4               1000000          1710 ns/op         304 B/op         12 allocs/op
BenchmarkSpecificDatumReader_hugeval_prepared-4      1000000          1092 ns/op         304 B/op         12 allocs/op

crast commented 8 years ago

@serejja Do you still want support for Go 1.1 and 1.2 ? If you do it's going to take more code because those versions don't have sync.Pool

serejja commented 8 years ago

Thanks @crast! I'm going to drop Go 1.1 and 1.2 support in CI now (and also add 1.6). Nobody uses 1.1 and 1.2 now anyway.

I'm also working on one other thing that is somewhat intertwined with the changes you made, which is in this branch. Basically, go-avro codegen will support generating Write() and Read() functions to avoid using reflection at all. Similar approach is used in json package (e.g. MarshalJSON, UnmarshalJSON) and could be an option to speed things up in a lot of cases too.

Any comments and thoughts appreciated. Thanks for your contributions!

crast commented 8 years ago

@serejja Awesome. Yeah that will produce noticeable speedups.

crast commented 8 years ago

I daresay that the codegen will make it an order of magnitude faster, because at this point the majority of time is spent in reflect. I remember when single-threaded decoding hugeval was 23000ns on my machine, now it's down to 2000ns, will be curious to see what you get out of the codegen approach.

elodina / go-avro

Implement prepared schemas for faster decoding #72