celestiaorg / rsmt2d

Go implementation of two dimensional Reed-Solomon merkle tree data availability scheme.
Apache License 2.0
161 stars 79 forks source link

dataSquare.extendSquare can take advantage of loop reuse when building fillerExtendedRow and fillerRow #305

Open odeke-em opened 6 months ago

odeke-em commented 6 months ago

While studying this code I noticed something that fillerRow is as long as ds.width+extendedWidth while fillerExtendedRow is as long as extendedWidth we could take advantage of loop reuse to build both fillerExtendedRow and fillerRow we can simply build them in the same loop and add a guard against the shorter length of extendedWidth

diff --git a/datasquare.go b/datasquare.go
index 0c16f59..5af113b 100644
--- a/datasquare.go
+++ b/datasquare.go
@@ -80,12 +80,12 @@ func (ds *dataSquare) extendSquare(extendedWidth uint, fillerChunk []byte) error
    newSquareRow := make([][][]byte, newWidth)

    fillerExtendedRow := make([][]byte, extendedWidth)
-   for i := uint(0); i < extendedWidth; i++ {
-       fillerExtendedRow[i] = fillerChunk
-   }
-
    fillerRow := make([][]byte, newWidth)
    for i := uint(0); i < newWidth; i++ {
+       if i < extendedWidth {
+           fillerExtendedRow[i] = fillerChunk
+       }
        fillerRow[i] = fillerChunk
    }

which produces interesting results

name                                          old time/op    new time/op    delta
Encoding/Leopard_128_shares_512-8               39.8µs ± 5%    39.0µs ± 3%     ~     (p=0.165 n=10+10)
Decoding/Leopard_128_shares_512-8                422ns ± 3%     424ns ± 4%     ~     (p=0.529 n=10+10)
EDSRoots/32x32x512_ODS-8                        5.92ms ± 4%    5.92ms ± 4%     ~     (p=0.853 n=10+10)
EDSRoots/64x64x512_ODS-8                        22.3ms ± 3%    22.2ms ± 2%     ~     (p=0.661 n=10+9)
EDSRoots/128x128x512_ODS-8                       106ms ±29%      87ms ± 4%  -17.51%  (p=0.022 n=9+10)
EDSRoots/256x256x512_ODS-8                       380ms ± 1%     350ms ± 3%   -7.90%  (p=0.000 n=8+9)
EDSRoots/512x512x512_ODS-8                       1.55s ±11%     1.35s ± 2%  -12.69%  (p=0.000 n=9+9)
Repair/Leopard_4x4x512_ODS-8                     370µs ± 6%     344µs ± 2%   -7.09%  (p=0.000 n=10+9)
Repair/Leopard_8x8x512_ODS-8                    1.37ms ± 3%    1.39ms ± 7%     ~     (p=0.315 n=9+10)
Repair/Leopard_16x16x512_ODS-8                  5.56ms ± 2%    5.43ms ± 1%   -2.23%  (p=0.000 n=10+10)
Repair/Leopard_32x32x512_ODS-8                  23.4ms ± 4%    22.3ms ± 4%   -4.90%  (p=0.001 n=9+9)
Repair/Leopard_64x64x512_ODS-8                  93.8ms ± 2%    89.5ms ± 2%   -4.60%  (p=0.000 n=9+10)
Repair/Leopard_128x128x512_ODS-8                 387ms ±16%     372ms ± 7%     ~     (p=0.912 n=10+10)
Repair/Leopard_256x256x512_ODS-8                 2.52s ± 5%     2.47s ± 2%     ~     (p=0.481 n=10+10)
Repair/Leopard_512x512x512_ODS-8                 8.93s ± 5%     8.66s ± 4%   -3.03%  (p=0.029 n=10+10)
ExtensionEncoding/Leopard_4x4x512_ODS-8         24.0µs ±10%    24.4µs ± 7%     ~     (p=0.190 n=10+10)
ExtensionEncoding/Leopard_8x8x512_ODS-8         61.9µs ±14%    59.2µs ± 4%     ~     (p=0.075 n=10+10)
ExtensionEncoding/Leopard_16x16x512_ODS-8        154µs ± 8%     148µs ± 3%   -4.23%  (p=0.010 n=10+9)
ExtensionEncoding/Leopard_32x32x512_ODS-8        474µs ± 5%     475µs ± 3%     ~     (p=0.400 n=9+10)
ExtensionEncoding/Leopard_64x64x512_ODS-8       1.74ms ± 9%    1.64ms ± 6%   -5.63%  (p=0.008 n=9+9)
ExtensionEncoding/Leopard_128x128x512_ODS-8     7.96ms ± 5%    7.04ms ± 1%  -11.56%  (p=0.000 n=7+10)
ExtensionEncoding/Leopard_256x256x512_ODS-8     40.6ms ±13%    36.9ms ±11%   -9.21%  (p=0.043 n=10+9)
ExtensionEncoding/Leopard_512x512x512_ODS-8      164ms ± 4%     164ms ± 5%     ~     (p=1.000 n=8+8)
ExtensionWithRoots/Leopard_4x4x512_ODS-8         171µs ± 6%     170µs ± 2%     ~     (p=0.604 n=10+9)
ExtensionWithRoots/Leopard_8x8x512_ODS-8         531µs ±10%     532µs ± 8%     ~     (p=0.739 n=10+10)
ExtensionWithRoots/Leopard_16x16x512_ODS-8      1.72ms ± 3%    1.79ms ±11%     ~     (p=0.133 n=10+9)
ExtensionWithRoots/Leopard_32x32x512_ODS-8      6.10ms ± 4%    7.32ms ±23%  +20.00%  (p=0.000 n=10+9)
ExtensionWithRoots/Leopard_64x64x512_ODS-8      23.6ms ± 2%    24.0ms ± 3%     ~     (p=0.136 n=9+9)
ExtensionWithRoots/Leopard_128x128x512_ODS-8    97.2ms ± 9%    96.4ms ± 8%     ~     (p=0.529 n=10+10)
ExtensionWithRoots/Leopard_256x256x512_ODS-8     459ms ±11%     399ms ± 5%  -13.06%  (p=0.000 n=10+9)
ExtensionWithRoots/Leopard_512x512x512_ODS-8     1.79s ± 4%     1.58s ± 4%  -12.04%  (p=0.000 n=9+8)

name                                          old alloc/op   new alloc/op   delta
Encoding/Leopard_128_shares_512-8                120kB ± 2%     118kB ± 2%   -1.58%  (p=0.043 n=10+10)
Decoding/Leopard_128_shares_512-8                0.00B          0.00B          ~     (all equal)
EDSRoots/32x32x512_ODS-8                        1.81MB ± 0%    1.81MB ± 0%     ~     (p=0.753 n=10+10)
EDSRoots/64x64x512_ODS-8                        6.24MB ± 0%    6.24MB ± 0%     ~     (p=1.000 n=9+10)
EDSRoots/128x128x512_ODS-8                      26.5MB ± 0%    26.5MB ± 0%     ~     (p=0.833 n=10+10)
EDSRoots/256x256x512_ODS-8                       109MB ± 0%     109MB ± 0%     ~     (p=0.435 n=9+9)
EDSRoots/512x512x512_ODS-8                       497MB ± 0%     497MB ± 0%     ~     (p=0.467 n=8+8)
Repair/Leopard_4x4x512_ODS-8                     163kB ± 1%     164kB ± 1%     ~     (p=0.190 n=10+10)
Repair/Leopard_8x8x512_ODS-8                     521kB ± 4%     526kB ± 4%     ~     (p=0.353 n=10+10)
Repair/Leopard_16x16x512_ODS-8                  1.89MB ± 9%    1.89MB ±10%     ~     (p=0.912 n=10+10)
Repair/Leopard_32x32x512_ODS-8                  7.68MB ± 5%    7.79MB ± 8%     ~     (p=0.280 n=10+10)
Repair/Leopard_64x64x512_ODS-8                  32.0MB ± 5%    33.5MB ± 7%   +4.51%  (p=0.009 n=10+10)
Repair/Leopard_128x128x512_ODS-8                 131MB ± 7%     133MB ±11%     ~     (p=0.280 n=10+10)
Repair/Leopard_256x256x512_ODS-8                 440MB ± 1%     440MB ± 0%     ~     (p=0.912 n=10+10)
Repair/Leopard_512x512x512_ODS-8                1.72GB ± 0%    1.72GB ± 0%     ~     (p=0.965 n=8+10)
ExtensionEncoding/Leopard_4x4x512_ODS-8         38.3kB ± 0%    38.3kB ± 0%     ~     (p=0.529 n=10+10)
ExtensionEncoding/Leopard_8x8x512_ODS-8          147kB ± 0%     147kB ± 1%     ~     (p=0.113 n=10+9)
ExtensionEncoding/Leopard_16x16x512_ODS-8        614kB ± 1%     612kB ± 0%     ~     (p=0.436 n=10+10)
ExtensionEncoding/Leopard_32x32x512_ODS-8       2.51MB ± 1%    2.52MB ± 1%     ~     (p=0.247 n=10+10)
ExtensionEncoding/Leopard_64x64x512_ODS-8       10.3MB ± 2%    10.2MB ± 1%     ~     (p=0.447 n=10+9)
ExtensionEncoding/Leopard_128x128x512_ODS-8     42.6MB ± 3%    42.9MB ± 3%     ~     (p=0.258 n=9+9)
ExtensionEncoding/Leopard_256x256x512_ODS-8      127MB ± 0%     127MB ± 0%     ~     (p=0.529 n=10+10)
ExtensionEncoding/Leopard_512x512x512_ODS-8      512MB ± 0%     512MB ± 0%     ~     (p=0.971 n=10+10)
ExtensionWithRoots/Leopard_4x4x512_ODS-8         119kB ± 0%     119kB ± 0%     ~     (p=0.404 n=10+10)
ExtensionWithRoots/Leopard_8x8x512_ODS-8         356kB ± 1%     357kB ± 1%     ~     (p=0.143 n=10+10)
ExtensionWithRoots/Leopard_16x16x512_ODS-8      1.20MB ± 2%    1.20MB ± 1%     ~     (p=0.475 n=10+7)
ExtensionWithRoots/Leopard_32x32x512_ODS-8      4.36MB ± 2%    4.34MB ± 2%     ~     (p=0.673 n=8+9)
ExtensionWithRoots/Leopard_64x64x512_ODS-8      16.5MB ± 5%    16.6MB ± 3%     ~     (p=0.604 n=10+9)
ExtensionWithRoots/Leopard_128x128x512_ODS-8    73.4MB ±19%    77.8MB ±16%     ~     (p=0.089 n=10+10)
ExtensionWithRoots/Leopard_256x256x512_ODS-8     236MB ± 0%     237MB ± 0%     ~     (p=0.280 n=10+10)
ExtensionWithRoots/Leopard_512x512x512_ODS-8    1.01GB ± 0%    1.01GB ± 0%     ~     (p=0.159 n=9+8)

name                                          old allocs/op  new allocs/op  delta
Encoding/Leopard_128_shares_512-8                  132 ± 0%       132 ± 0%     ~     (all equal)
Decoding/Leopard_128_shares_512-8                 0.00           0.00          ~     (all equal)
EDSRoots/32x32x512_ODS-8                         34.1k ± 0%     34.1k ± 0%     ~     (p=1.000 n=10+10)
EDSRoots/64x64x512_ODS-8                          134k ± 0%      134k ± 0%     ~     (p=0.550 n=10+10)
EDSRoots/128x128x512_ODS-8                        530k ± 0%      530k ± 0%     ~     (p=0.248 n=10+8)
EDSRoots/256x256x512_ODS-8                       2.11M ± 0%     2.11M ± 0%     ~     (p=0.480 n=10+10)
EDSRoots/512x512x512_ODS-8                       8.42M ± 0%     8.42M ± 0%     ~     (p=0.467 n=8+8)
Repair/Leopard_4x4x512_ODS-8                       772 ± 0%       772 ± 0%     ~     (all equal)
Repair/Leopard_8x8x512_ODS-8                     2.86k ± 0%     2.86k ± 0%     ~     (p=0.802 n=9+9)
Repair/Leopard_16x16x512_ODS-8                   10.9k ± 0%     10.9k ± 0%     ~     (p=0.698 n=10+10)
Repair/Leopard_32x32x512_ODS-8                   42.2k ± 0%     42.2k ± 0%     ~     (p=0.433 n=10+9)
Repair/Leopard_64x64x512_ODS-8                    166k ± 0%      166k ± 0%     ~     (p=0.985 n=10+10)
Repair/Leopard_128x128x512_ODS-8                  660k ± 0%      660k ± 0%     ~     (p=0.725 n=10+10)
Repair/Leopard_256x256x512_ODS-8                 2.64M ± 0%     2.63M ± 0%     ~     (p=0.853 n=10+10)
Repair/Leopard_512x512x512_ODS-8                 10.5M ± 0%     10.5M ± 0%     ~     (p=0.965 n=8+10)
ExtensionEncoding/Leopard_4x4x512_ODS-8            153 ± 0%       153 ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_8x8x512_ODS-8            389 ± 0%       389 ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_16x16x512_ODS-8        1.15k ± 0%     1.15k ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_32x32x512_ODS-8        3.82k ± 0%     3.82k ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_64x64x512_ODS-8        13.8k ± 0%     13.8k ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_128x128x512_ODS-8      52.1k ± 0%     52.1k ± 0%     ~     (p=0.353 n=10+7)
ExtensionEncoding/Leopard_256x256x512_ODS-8       201k ± 0%      201k ± 0%     ~     (p=0.340 n=10+10)
ExtensionEncoding/Leopard_512x512x512_ODS-8       795k ± 0%      795k ± 0%     ~     (p=0.154 n=9+9)
ExtensionWithRoots/Leopard_4x4x512_ODS-8           830 ± 0%       830 ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_8x8x512_ODS-8         2.79k ± 0%     2.79k ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_16x16x512_ODS-8       10.1k ± 0%     10.1k ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_32x32x512_ODS-8       38.0k ± 0%     38.0k ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_64x64x512_ODS-8        148k ± 0%      148k ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_128x128x512_ODS-8      583k ± 0%      583k ± 0%     ~     (p=0.894 n=10+10)
ExtensionWithRoots/Leopard_256x256x512_ODS-8     2.31M ± 0%     2.31M ± 0%     ~     (p=0.617 n=10+9)
ExtensionWithRoots/Leopard_512x512x512_ODS-8     9.22M ± 0%     9.22M ± 0%     ~     (p=0.443 n=9+9)

/cc @elias-orijtech @liamsi @rootulp @musalbas @staheri14

rootulp commented 6 months ago

Agreed and it seem like the optimization you propose still works as expected.

The performance penalty comes at the cost of (in my subjective opinion) slightly less code readability so I think we should only implement this optimization if we need to.

I'm also puzzled by the few instances where benchmarking shows this optimization results in a positive delta. I.e:

ExtensionWithRoots/Leopard_32x32x512_ODS-8      6.10ms ± 4%    7.32ms ±23%  +20.00%  (p=0.000 n=10+9)

Repair/Leopard_64x64x512_ODS-8                  32.0MB ± 5%    33.5MB ± 7%   +4.51%  (p=0.009 n=10+10)