bbrks / go-blurhash

🖼#️⃣ A pure Go implementation of Blurhash
MIT License
146 stars 7 forks source link

Add DecodeDraw to allow custom image types #3

Closed diamondburned closed 4 years ago

diamondburned commented 4 years ago

Closes #2

codecov[bot] commented 4 years ago

Codecov Report

Merging #3 into master will decrease coverage by 0.57%. The diff coverage is 62.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #3      +/-   ##
==========================================
- Coverage   81.31%   80.74%   -0.58%     
==========================================
  Files           6        6              
  Lines         182      187       +5     
==========================================
+ Hits          148      151       +3     
- Misses         21       22       +1     
- Partials       13       14       +1     
Impacted Files Coverage Δ
decode.go 79.41% <62.50%> (-1.55%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b1b1c81...9fa9a88. Read the comment docs.

bbrks commented 4 years ago

Hey, thanks for the contribution!

The detailed writeup in the issue makes sense to me, as well as the cursory glance over the PR. I'll have a play around with this a little bit in my day tomorrow and merge if all looks good 👍

bbrks commented 4 years ago

Sorry, this is taking a little longer than expected. I ran into an issue when testing this where the new interface-based image.Set is causing a huge increase in allocations for the default Decode method (for NRGBA images).

I can work around it by doing a type check to switch to the faster paths for known image types, but want to be sure I've not screwed anything up in the process of switching between alpha and non-alpha premultiplied colors when doing so.

16:50 $ benchcmp bench-master.txt bench-new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                                          old ns/op     new ns/op     delta
BenchmarkDecode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8     820101        815481        -0.56%
BenchmarkDecode/LNAdApj[00aymkj[TKay9}ay-Sj[-8     750371        761578        +1.49%
BenchmarkDecode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8     816556        823240        +0.82%
BenchmarkDecode/KJG8_@Dgx]_4V?xuyE%NRj-8           687708        699058        +1.65%

benchmark                                          old allocs     new allocs     delta
BenchmarkDecode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8     3              1027           +34133.33%
BenchmarkDecode/LNAdApj[00aymkj[TKay9}ay-Sj[-8     3              1027           +34133.33%
BenchmarkDecode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8     3              1027           +34133.33%
BenchmarkDecode/KJG8_@Dgx]_4V?xuyE%NRj-8           3              1027           +34133.33%

benchmark                                          old bytes     new bytes     delta
BenchmarkDecode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8     4448          8544          +92.09%
BenchmarkDecode/LNAdApj[00aymkj[TKay9}ay-Sj[-8     4448          8544          +92.09%
BenchmarkDecode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8     4448          8544          +92.09%
BenchmarkDecode/KJG8_@Dgx]_4V?xuyE%NRj-8           4384          8480          +93.43%
diamondburned commented 4 years ago

That's an interesting observation. I'm guessing it has to do with the type assertion here, which possibly creates a copy of color.NRGBA.

I think a fast path approach is reasonable. Something like this, I assume:

setter, isNRGBA := dst.(*image.NRGBA)

for .. {
    for .. {
        if isNRGBA {
            setter.SetNRGBA(x, y, color.NRGBA{})
        } else {
            dst.Set(x, y, color.NRGBA{})
        }
    }
}

Edit: It seems like inlining color.NRGBA doesn't have any benefits compiler-wise. Below are 2 Godbolt links that possibly demonstrate this:

As we can see, the Go code and Assembly outputs for both pastes are different. However, what's more important is that both have the same amount of movq instructions, thus in either cases, they will treat the struct values the same.

Given this, I can rewrite the above example as such:

for .. {
    for .. {
        var c = color.NRGBA{}

        if isNRGBA {
            setter.SetNRGBA(x, y, c)
        } else {
            dst.Set(x, y, c)
        }
    }
}
bbrks commented 4 years ago

I think this is going to end up being just as alloc heavy when using an RGBA image instead of an NRGBA image. I've played around a little and got a fast path low-alloc approach for both cases. Only downside that I can see is the type switch is inside the loop, but the benchmark doesn't show any impact.

What do you think?

type drawImageNRGBA interface {
    SetNRGBA(x, y int, c color.NRGBA)
}

type drawImageRGBA interface {
    SetRGBA(x, y int, c color.RGBA)
}
for .. {
    for .. {

            switch d := dst.(type) {
            case *image.NRGBA:
                d.SetNRGBA(x, y, color.NRGBA{
                    R: uint8(linearTosRGB(r)),
                    G: uint8(linearTosRGB(g)),
                    B: uint8(linearTosRGB(b)),
                    A: 255,
                })
            case *image.RGBA:
                d.SetRGBA(x, y, color.RGBA{
                    R: uint8(linearTosRGB(r)),
                    G: uint8(linearTosRGB(g)),
                    B: uint8(linearTosRGB(b)),
                    A: 255,
                })
            default:
                d.Set(x, y, color.NRGBA{
                    R: uint8(linearTosRGB(r)),
                    G: uint8(linearTosRGB(g)),
                    B: uint8(linearTosRGB(b)),
                    A: 255,
                })
            }
22:01 $ benchstat orig.txt pr.txt pr-switch2.txt
name \ time/op                             orig.txt     pr.txt        pr-switch2.txt
Decode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8       874µs ± 7%    825µs ± 1%      795µs ± 1%
Decode/LNAdApj[00aymkj[TKay9}ay-Sj[-8       801µs ± 3%    765µs ± 1%      733µs ± 1%
Decode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8       881µs ±10%    831µs ± 0%      798µs ± 1%
Decode/KJG8_@Dgx]_4V?xuyE%NRj-8             756µs ±11%    702µs ± 1%      669µs ± 1%
DecodeDraw/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8                 851µs ± 1%      813µs ± 3%
DecodeDraw/LNAdApj[00aymkj[TKay9}ay-Sj[-8                 787µs ± 0%      742µs ± 3%
DecodeDraw/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8                 852µs ± 0%      797µs ± 1%
DecodeDraw/KJG8_@Dgx]_4V?xuyE%NRj-8                       726µs ± 1%      668µs ± 0%

name \ alloc/op                            orig.txt     pr.txt        pr-switch2.txt
Decode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8      4.45kB ± 0%   8.54kB ± 0%     4.45kB ± 0%
Decode/LNAdApj[00aymkj[TKay9}ay-Sj[-8      4.45kB ± 0%   8.54kB ± 0%     4.45kB ± 0%
Decode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8      4.45kB ± 0%   8.54kB ± 0%     4.45kB ± 0%
Decode/KJG8_@Dgx]_4V?xuyE%NRj-8            4.38kB ± 0%   8.48kB ± 0%     4.38kB ± 0%
DecodeDraw/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8                8.48kB ± 0%     0.29kB ± 0%
DecodeDraw/LNAdApj[00aymkj[TKay9}ay-Sj[-8                8.48kB ± 0%     0.29kB ± 0%
DecodeDraw/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8                8.48kB ± 0%     0.29kB ± 0%
DecodeDraw/KJG8_@Dgx]_4V?xuyE%NRj-8                      8.42kB ± 0%     0.23kB ± 0%

name \ allocs/op                           orig.txt     pr.txt        pr-switch2.txt
Decode/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8        3.00 ± 0%  1027.00 ± 0%       3.00 ± 0%
Decode/LNAdApj[00aymkj[TKay9}ay-Sj[-8        3.00 ± 0%  1027.00 ± 0%       3.00 ± 0%
Decode/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8        3.00 ± 0%  1027.00 ± 0%       3.00 ± 0%
Decode/KJG8_@Dgx]_4V?xuyE%NRj-8              3.00 ± 0%  1027.00 ± 0%       3.00 ± 0%
DecodeDraw/LFE.@D9F01_2%L%MIVD*9Goe-;WB-8                 2.05k ± 0%      0.00k ± 0%
DecodeDraw/LNAdApj[00aymkj[TKay9}ay-Sj[-8                 2.05k ± 0%      0.00k ± 0%
DecodeDraw/LNMF%n00%#MwS|WCWEM{R*bbWBbH-8                 2.05k ± 0%      0.00k ± 0%
DecodeDraw/KJG8_@Dgx]_4V?xuyE%NRj-8                       2.05k ± 0%      0.00k ± 0%
func BenchmarkDecodeDraw(b *testing.B) {
    for _, test := range testFixtures {
        // skip tests without hashes
        if test.hash == "" {
            continue
        }

        b.Run(test.hash, func(b *testing.B) {
            dst := image.NewRGBA(image.Rect(0, 0, 32, 32))
            for i := 0; i < b.N; i++ {
                _ = blurhash.DecodeDraw(dst, test.hash, 1)
            }
        })
    }
}
diamondburned commented 4 years ago

I think the color should be created outside the type switch to make it shorter. Other than that, I think it's fine.

bbrks commented 4 years ago

Sorry this took a while to merge! I've been tied up with work stuff recently.