Closed diamondburned closed 4 years ago
Merging #3 into master will decrease coverage by
0.57%
. The diff coverage is62.50%
.
@@ Coverage Diff @@
## master #3 +/- ##
==========================================
- Coverage 81.31% 80.74% -0.58%
==========================================
Files 6 6
Lines 182 187 +5
==========================================
+ Hits 148 151 +3
- Misses 21 22 +1
- Partials 13 14 +1
Impacted Files | Coverage Δ | |
---|---|---|
decode.go | 79.41% <62.50%> (-1.55%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update b1b1c81...9fa9a88. Read the comment docs.
Hey, thanks for the contribution!
The detailed writeup in the issue makes sense to me, as well as the cursory glance over the PR. I'll have a play around with this a little bit in my day tomorrow and merge if all looks good 👍
Sorry, this is taking a little longer than expected. I ran into an issue when testing this where the new interface-based image.Set
is causing a huge increase in allocations for the default Decode method (for NRGBA images).
I can work around it by doing a type check to switch to the faster paths for known image types, but want to be sure I've not screwed anything up in the process of switching between alpha and non-alpha premultiplied colors when doing so.
16:50 $ benchcmp bench-master.txt bench-new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark old ns/op new ns/op delta
BenchmarkDecode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8 820101 815481 -0.56%
BenchmarkDecode/LNAdApj[00aymkj[TKay9}ay-Sj[-8 750371 761578 +1.49%
BenchmarkDecode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8 816556 823240 +0.82%
BenchmarkDecode/KJG8_@Dgx]_4V?xuyE%NRj-8 687708 699058 +1.65%
benchmark old allocs new allocs delta
BenchmarkDecode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8 3 1027 +34133.33%
BenchmarkDecode/LNAdApj[00aymkj[TKay9}ay-Sj[-8 3 1027 +34133.33%
BenchmarkDecode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8 3 1027 +34133.33%
BenchmarkDecode/KJG8_@Dgx]_4V?xuyE%NRj-8 3 1027 +34133.33%
benchmark old bytes new bytes delta
BenchmarkDecode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8 4448 8544 +92.09%
BenchmarkDecode/LNAdApj[00aymkj[TKay9}ay-Sj[-8 4448 8544 +92.09%
BenchmarkDecode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8 4448 8544 +92.09%
BenchmarkDecode/KJG8_@Dgx]_4V?xuyE%NRj-8 4384 8480 +93.43%
That's an interesting observation. I'm guessing it has to do with the type assertion here, which possibly creates a copy of color.NRGBA
.
I think a fast path approach is reasonable. Something like this, I assume:
setter, isNRGBA := dst.(*image.NRGBA)
for .. {
for .. {
if isNRGBA {
setter.SetNRGBA(x, y, color.NRGBA{})
} else {
dst.Set(x, y, color.NRGBA{})
}
}
}
Edit: It seems like inlining color.NRGBA
doesn't have any benefits compiler-wise. Below are 2 Godbolt links that possibly demonstrate this:
As we can see, the Go code and Assembly outputs for both pastes are different. However, what's more important is that both have the same amount of movq
instructions, thus in either cases, they will treat the struct values the same.
Given this, I can rewrite the above example as such:
for .. {
for .. {
var c = color.NRGBA{}
if isNRGBA {
setter.SetNRGBA(x, y, c)
} else {
dst.Set(x, y, c)
}
}
}
I think this is going to end up being just as alloc heavy when using an RGBA image instead of an NRGBA image. I've played around a little and got a fast path low-alloc approach for both cases. Only downside that I can see is the type switch is inside the loop, but the benchmark doesn't show any impact.
What do you think?
type drawImageNRGBA interface {
SetNRGBA(x, y int, c color.NRGBA)
}
type drawImageRGBA interface {
SetRGBA(x, y int, c color.RGBA)
}
for .. {
for .. {
switch d := dst.(type) {
case *image.NRGBA:
d.SetNRGBA(x, y, color.NRGBA{
R: uint8(linearTosRGB(r)),
G: uint8(linearTosRGB(g)),
B: uint8(linearTosRGB(b)),
A: 255,
})
case *image.RGBA:
d.SetRGBA(x, y, color.RGBA{
R: uint8(linearTosRGB(r)),
G: uint8(linearTosRGB(g)),
B: uint8(linearTosRGB(b)),
A: 255,
})
default:
d.Set(x, y, color.NRGBA{
R: uint8(linearTosRGB(r)),
G: uint8(linearTosRGB(g)),
B: uint8(linearTosRGB(b)),
A: 255,
})
}
22:01 $ benchstat orig.txt pr.txt pr-switch2.txt
name \ time/op orig.txt pr.txt pr-switch2.txt
Decode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8 874µs ± 7% 825µs ± 1% 795µs ± 1%
Decode/LNAdApj[00aymkj[TKay9}ay-Sj[-8 801µs ± 3% 765µs ± 1% 733µs ± 1%
Decode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8 881µs ±10% 831µs ± 0% 798µs ± 1%
Decode/KJG8_@Dgx]_4V?xuyE%NRj-8 756µs ±11% 702µs ± 1% 669µs ± 1%
DecodeDraw/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8 851µs ± 1% 813µs ± 3%
DecodeDraw/LNAdApj[00aymkj[TKay9}ay-Sj[-8 787µs ± 0% 742µs ± 3%
DecodeDraw/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8 852µs ± 0% 797µs ± 1%
DecodeDraw/KJG8_@Dgx]_4V?xuyE%NRj-8 726µs ± 1% 668µs ± 0%
name \ alloc/op orig.txt pr.txt pr-switch2.txt
Decode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8 4.45kB ± 0% 8.54kB ± 0% 4.45kB ± 0%
Decode/LNAdApj[00aymkj[TKay9}ay-Sj[-8 4.45kB ± 0% 8.54kB ± 0% 4.45kB ± 0%
Decode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8 4.45kB ± 0% 8.54kB ± 0% 4.45kB ± 0%
Decode/KJG8_@Dgx]_4V?xuyE%NRj-8 4.38kB ± 0% 8.48kB ± 0% 4.38kB ± 0%
DecodeDraw/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8 8.48kB ± 0% 0.29kB ± 0%
DecodeDraw/LNAdApj[00aymkj[TKay9}ay-Sj[-8 8.48kB ± 0% 0.29kB ± 0%
DecodeDraw/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8 8.48kB ± 0% 0.29kB ± 0%
DecodeDraw/KJG8_@Dgx]_4V?xuyE%NRj-8 8.42kB ± 0% 0.23kB ± 0%
name \ allocs/op orig.txt pr.txt pr-switch2.txt
Decode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8 3.00 ± 0% 1027.00 ± 0% 3.00 ± 0%
Decode/LNAdApj[00aymkj[TKay9}ay-Sj[-8 3.00 ± 0% 1027.00 ± 0% 3.00 ± 0%
Decode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8 3.00 ± 0% 1027.00 ± 0% 3.00 ± 0%
Decode/KJG8_@Dgx]_4V?xuyE%NRj-8 3.00 ± 0% 1027.00 ± 0% 3.00 ± 0%
DecodeDraw/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8 2.05k ± 0% 0.00k ± 0%
DecodeDraw/LNAdApj[00aymkj[TKay9}ay-Sj[-8 2.05k ± 0% 0.00k ± 0%
DecodeDraw/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8 2.05k ± 0% 0.00k ± 0%
DecodeDraw/KJG8_@Dgx]_4V?xuyE%NRj-8 2.05k ± 0% 0.00k ± 0%
func BenchmarkDecodeDraw(b *testing.B) {
for _, test := range testFixtures {
// skip tests without hashes
if test.hash == "" {
continue
}
b.Run(test.hash, func(b *testing.B) {
dst := image.NewRGBA(image.Rect(0, 0, 32, 32))
for i := 0; i < b.N; i++ {
_ = blurhash.DecodeDraw(dst, test.hash, 1)
}
})
}
}
I think the color should be created outside the type switch to make it shorter. Other than that, I think it's fine.
Sorry this took a while to merge! I've been tied up with work stuff recently.
Closes #2