georgysavva / scany

Library for scanning data from a database into Go structs and more
MIT License
1.27k stars 67 forks source link

Feature: use a pool of scans slices #24

Closed kovalromank closed 3 years ago

kovalromank commented 3 years ago

Thanks for the great library.

While looking through this library I noticed that each time RowScanner.Scan was called a new interface slice was created.

Since the row scanner caches column names and field indexes I wanted to see if there could be a benefit to using a pool of slices rather than allocating a new one each scan.

I created a data struct with 1024 columns and some quick benchmarks to my fork of scany here. The benchmark data struct and the benchmarks are in two new files in my fork, bench_data_test.go and bench_test.go, if anyone wants to run the benchmarks for themselves.

Results of benchmarks:

goos: darwin
goarch: amd64
pkg: github.com/georgysavva/scany
BenchmarkStructPool
BenchmarkStructPool-8          16312         84675 ns/op          44 B/op          1 allocs/op
BenchmarkStruct
BenchmarkStruct-8              13929         81237 ns/op       16397 B/op          1 allocs/op
BenchmarkMapPool
BenchmarkMapPool-8              5966        171132 ns/op       57429 B/op       2050 allocs/op
BenchmarkMap
BenchmarkMap-8                  6478        171839 ns/op       73760 B/op       2050 allocs/op
PASS

Using a pool of slices reduces the memory usage by over 16000 B/op when scanning into both, a struct or a map. And specifically for a struct, the bytes allocated remain constant even though there are 1024 different columns.

This is a great use of sync.Pool since due to RowScanner's caching the allocated slices are of the same length each time Scan is called. I think it would be useful for RowScanner to provide an option for using a pool instead of allocating a new slice.

georgysavva commented 3 years ago

Hi. Any reason for closing?

kovalromank commented 3 years ago

I ran some different benchmarks and noticed caching the column to field index maps would help a lot more when creating a new row scanner each iteration instead of reusing one as what I'm doing here. I created a new issue #25 explaining the cache.

I think creating a new row scanner is a lot more common because of functions like ScanAll/ScanOne.