Closed TheZeroSlave closed 4 years ago
@TheZeroSlave Thanks a lot for the proposal.
It looks good! I'm looking forward to those tests, we can run a benchmark before and after the change to mention the speedup in the changelog.
You can find the latest benchmark here: https://github.com/anthonynsimon/bild/blob/master/benchmarks.txt
And to run it just do make bench
from the project root.
@anthonynsimon hi, added unit-tests and ran benchmarks.
with shallow copy BenchmarkResizeTenth-4 25 217153552 ns/op 7373627 B/op 9 allocs/op BenchmarkResizeQuarter-4 24 250252750 ns/op 20972372 B/op 9 allocs/op BenchmarkResizeHalf-4 15 356039956 ns/op 50332153 B/op 8 allocs/op BenchmarkResize1x-4 130 48361860 ns/op 8389022 B/op 8 allocs/op BenchmarkResize2x-4 43 117064701 ns/op 25166176 B/op 8 allocs/op BenchmarkResize4x-4 13 397145509 ns/op 83886432 B/op 8 allocs/op BenchmarkResize8x-4 3 1763986385 ns/op 301990240 B/op 8 allocs/op BenchmarkResize16x-4 1 6359523333 ns/op 1140851040 B/op 8 allocs/op BenchmarkRotation256-4 1033 6062785 ns/op 1310933 B/op 196613 allocs/op BenchmarkRotation512-4 164 48916707 ns/op 5243178 B/op 786437 allocs/op BenchmarkRotation1024-4 34 180321112 ns/op 20971829 B/op 3145733 allocs/op BenchmarkRotation2048-4 8 848386378 ns/op 83886412 B/op 12582917 allocs/op BenchmarkRotation4096-4 2 3163275642 ns/op 335544864 B/op 50331653 allocs/op BenchmarkRotation8192-4 1 13777945692 ns/op 1342178616 B/op 201326599 allocs/op BenchmarkTranslate-4 3328 1701351 ns/op 4194454 B/op 4 allocs/op
without shallow copy BenchmarkResizeTenth-4 24 230407388 ns/op 74482650 B/op 11 allocs/op BenchmarkResizeQuarter-4 18 280340924 ns/op 88081424 B/op 11 allocs/op BenchmarkResizeHalf-4 14 511973574 ns/op 117441078 B/op 10 allocs/op BenchmarkResize1x-4 73 73447500 ns/op 12583435 B/op 10 allocs/op BenchmarkResize2x-4 46 128738237 ns/op 29360571 B/op 10 allocs/op BenchmarkResize4x-4 12 924230733 ns/op 88080816 B/op 10 allocs/op BenchmarkResize8x-4 2 3668632780 ns/op 306184608 B/op 10 allocs/op BenchmarkResize16x-4 1 19528657506 ns/op 1145045408 B/op 10 allocs/op BenchmarkRotation256-4 634 11098297 ns/op 1573149 B/op 196615 allocs/op BenchmarkRotation512-4 160 37562457 ns/op 6291797 B/op 786439 allocs/op BenchmarkRotation1024-4 54 224589252 ns/op 25166181 B/op 3145735 allocs/op BenchmarkRotation2048-4 13 601157733 ns/op 100663720 B/op 12582919 allocs/op BenchmarkRotation4096-4 3 1850906885 ns/op 402653800 B/op 50331655 allocs/op BenchmarkRotation8192-4 1 12174137055 ns/op 1610613656 B/op 201326599 allocs/op BenchmarkTranslate-4 2006 4267173 ns/op 8388820 B/op 6 allocs/op
@anthonynsimon hi, i updated PR. how is it?
Hey thanks a lot for adding the tests and benchmark! I’ll review it next week since I’m away from the laptop, but from a quick glance it looks like the bytes and allocations per operation have been reduced. That’s really promising!
I compared the outputs locally with benchcmp and it looks pretty good.
Up to 20% less allocations and 90% less bytes allocated. Seems like the ns/op metric varies a lot to be significant.
benchmark old ns/op new ns/op delta
BenchmarkApply-8 235 246 +4.68%
BenchmarkConvolve3-8 34549582 35741921 +3.45%
BenchmarkConvolve8-8 171680173 190420935 +10.92%
BenchmarkConvolve32-8 2518109520 2767292538 +9.90%
BenchmarkConvolve64-8 10714574546 10873416232 +1.48%
BenchmarkMedian1-8 7040427 7668014 +8.91%
BenchmarkMedian4-8 249213484 255783631 +2.64%
BenchmarkMedian8-8 2811541792 2989137511 +6.32%
BenchmarkUniformMonochrome-8 31336323 31258739 -0.25%
BenchmarkUniformColored-8 92612321 85006676 -8.21%
BenchmarkFloodFill-8 71847721 91157358 +26.88%
BenchmarkResizeTenth-8 130110364 146703389 +12.75%
BenchmarkResizeQuarter-8 138140587 139680923 +1.12%
BenchmarkResizeHalf-8 180228642 175789838 -2.46%
BenchmarkResize1x-8 24427999 26805428 +9.73%
BenchmarkResize2x-8 61679394 68636367 +11.28%
BenchmarkResize4x-8 192366418 229433638 +19.27%
BenchmarkResize8x-8 679526617 705003659 +3.75%
BenchmarkResize16x-8 2575279222 2773490696 +7.70%
BenchmarkRotation256-8 3447317 3352541 -2.75%
BenchmarkRotation512-8 12897281 13250061 +2.74%
BenchmarkRotation1024-8 51453687 51092600 -0.70%
BenchmarkRotation2048-8 198856627 226509420 +13.91%
BenchmarkRotation4096-8 820024709 828146967 +0.99%
BenchmarkRotation8192-8 4562407650 3330166632 -27.01%
BenchmarkTranslate-8 1862565 986292 -47.05%
benchmark old allocs new allocs delta
BenchmarkApply-8 3 3 +0.00%
BenchmarkConvolve3-8 8 8 +0.00%
BenchmarkConvolve8-8 8 8 +0.00%
BenchmarkConvolve32-8 10 8 -20.00%
BenchmarkConvolve64-8 9 8 -11.11%
BenchmarkMedian1-8 65544 65544 +0.00%
BenchmarkMedian4-8 65550 65550 +0.00%
BenchmarkMedian8-8 65574 65579 +0.01%
BenchmarkUniformMonochrome-8 4 4 +0.00%
BenchmarkUniformColored-8 4 5 +25.00%
BenchmarkFloodFill-8 259081 259068 -0.01%
BenchmarkResizeTenth-8 11 13 +18.18%
BenchmarkResizeQuarter-8 10 8 -20.00%
BenchmarkResizeHalf-8 10 8 -20.00%
BenchmarkResize1x-8 10 8 -20.00%
BenchmarkResize2x-8 10 8 -20.00%
BenchmarkResize4x-8 10 8 -20.00%
BenchmarkResize8x-8 10 8 -20.00%
BenchmarkResize16x-8 10 8 -20.00%
BenchmarkRotation256-8 196615 196613 -0.00%
BenchmarkRotation512-8 786439 786437 -0.00%
BenchmarkRotation1024-8 3145735 3145734 -0.00%
BenchmarkRotation2048-8 12582919 12582917 -0.00%
BenchmarkRotation4096-8 50331655 50331653 -0.00%
BenchmarkRotation8192-8 201326599 201326598 -0.00%
BenchmarkTranslate-8 6 4 -33.33%
benchmark old bytes new bytes delta
BenchmarkApply-8 112 112 +0.00%
BenchmarkConvolve3-8 8413595 8413651 +0.00%
BenchmarkConvolve8-8 8462651 8462830 +0.00%
BenchmarkConvolve32-8 8660064 8659296 -0.01%
BenchmarkConvolve64-8 8929592 8929584 -0.00%
BenchmarkMedian1-8 3687122 3687124 +0.00%
BenchmarkMedian4-8 23629325 23629374 +0.00%
BenchmarkMedian8-8 84585400 84586016 +0.00%
BenchmarkUniformMonochrome-8 1048875 1048838 -0.00%
BenchmarkUniformColored-8 1048882 1048785 -0.01%
BenchmarkFloodFill-8 24618908 24620932 +0.01%
BenchmarkResizeTenth-8 74482784 7375268 -90.10%
BenchmarkResizeQuarter-8 88080978 20971918 -76.19%
BenchmarkResizeHalf-8 117440940 50332055 -57.14%
BenchmarkResize1x-8 12583410 8389032 -33.33%
BenchmarkResize2x-8 29360551 25166214 -14.29%
BenchmarkResize4x-8 88080803 83886451 -4.76%
BenchmarkResize8x-8 306184608 301990240 -1.37%
BenchmarkResize16x-8 1145045408 1140851040 -0.37%
BenchmarkRotation256-8 1573182 1310962 -16.67%
BenchmarkRotation512-8 6291828 5243248 -16.67%
BenchmarkRotation1024-8 25166272 20971981 -16.67%
BenchmarkRotation2048-8 100663784 83886575 -16.67%
BenchmarkRotation4096-8 402653793 335545027 -16.67%
BenchmarkRotation8192-8 1610613848 1342178984 -16.67%
BenchmarkTranslate-8 8388818 4194450 -50.00%
Hi. I noticed there are many allocations in this library during operations. We use it in our web service to resize images on the fly so i optimized it a little. This is unfinished pull request without tests just to check the idea of optimization.