gonum / matrix

Matrix packages for the Go language [DEPRECATED]
446 stars 51 forks source link

mat64: optimize r/w of Dense and Vector #356

Closed sbinet closed 8 years ago

sbinet commented 8 years ago

Fixes #346

name                    old time/op    new time/op    delta
MarshalDense10-4          2.66µs ± 1%    0.42µs ± 1%  -84.09%  (p=0.000 n=4+10)
MarshalDense100-4         22.3µs ± 1%     2.4µs ± 0%  -89.17%  (p=0.001 n=4+11)
MarshalDense1000-4         218µs ± 1%      23µs ± 1%  -89.54%  (p=0.001 n=4+11)
MarshalDense10000-4       2.18ms ± 1%    0.22ms ± 1%  -89.77%  (p=0.001 n=4+11)
UnmarshalDense10-4         918ns ± 0%     258ns ± 0%  -71.88%  (p=0.000 n=4+11)
UnmarshalDense100-4       3.83µs ± 0%    1.73µs ± 1%  -54.83%  (p=0.001 n=4+11)
UnmarshalDense1000-4      33.0µs ± 2%    16.0µs ± 1%  -51.33%  (p=0.001 n=4+11)
UnmarshalDense10000-4      309µs ± 0%     159µs ± 1%  -48.41%  (p=0.001 n=4+11)
MarshalVector10-4         2.48µs ± 1%    0.41µs ± 1%  -83.42%  (p=0.000 n=4+11)
MarshalVector100-4        21.9µs ± 1%     2.6µs ± 1%  -88.02%  (p=0.001 n=4+11)
MarshalVector1000-4        216µs ± 1%      24µs ± 1%  -88.66%  (p=0.001 n=4+11)
MarshalVector10000-4      2.15ms ± 1%    0.24ms ± 1%  -88.93%  (p=0.001 n=4+11)
UnmarshalVector10-4        811ns ± 0%     242ns ± 1%  -70.13%   (p=0.000 n=4+9)
UnmarshalVector100-4      3.75µs ± 0%    1.70µs ± 1%  -54.60%  (p=0.000 n=4+11)
UnmarshalVector1000-4     32.8µs ± 1%    16.0µs ± 1%  -51.17%  (p=0.001 n=4+11)
UnmarshalVector10000-4     309µs ± 1%     160µs ± 1%  -48.11%  (p=0.001 n=4+11)

name                    old alloc/op   new alloc/op   delta
MarshalDense10-4            400B ± 0%      208B ± 0%  -48.00%  (p=0.000 n=4+11)
MarshalDense100-4         2.64kB ± 0%    1.01kB ± 0%  -61.82%  (p=0.000 n=4+11)
MarshalDense1000-4        24.3kB ± 0%     8.3kB ± 0%  -65.88%  (p=0.000 n=4+11)
MarshalDense10000-4        242kB ± 0%      82kB ± 0%  -66.11%  (p=0.000 n=4+11)
UnmarshalDense10-4          272B ± 0%       80B ± 0%  -70.59%  (p=0.000 n=4+11)
UnmarshalDense100-4       1.90kB ± 0%    0.90kB ± 0%  -52.94%  (p=0.000 n=4+11)
UnmarshalDense1000-4      16.5kB ± 0%     8.2kB ± 0%  -50.34%  (p=0.000 n=4+11)
UnmarshalDense10000-4      164kB ± 0%      82kB ± 0%  -50.03%  (p=0.000 n=4+11)
MarshalVector10-4           384B ± 0%      208B ± 0%  -45.83%  (p=0.000 n=4+11)
MarshalVector100-4        2.62kB ± 0%    1.01kB ± 0%  -61.59%  (p=0.000 n=4+11)
MarshalVector1000-4       24.3kB ± 0%     8.3kB ± 0%  -65.86%  (p=0.000 n=4+11)
MarshalVector10000-4       242kB ± 0%      82kB ± 0%  -66.11%  (p=0.000 n=4+11)
UnmarshalVector10-4         256B ± 0%       80B ± 0%  -68.75%  (p=0.000 n=4+11)
UnmarshalVector100-4      1.89kB ± 0%    0.90kB ± 0%  -52.54%  (p=0.000 n=4+11)
UnmarshalVector1000-4     16.5kB ± 0%     8.2kB ± 0%  -50.29%  (p=0.000 n=4+11)
UnmarshalVector10000-4     164kB ± 0%      82kB ± 0%  -50.03%  (p=0.000 n=4+11)

name                    old allocs/op  new allocs/op  delta
MarshalDense10-4            26.0 ± 0%       2.0 ± 0%  -92.31%  (p=0.000 n=4+11)
MarshalDense100-4            206 ± 0%         2 ± 0%  -99.03%  (p=0.000 n=4+11)
MarshalDense1000-4         2.01k ± 0%     0.00k ± 0%  -99.90%  (p=0.000 n=4+11)
MarshalDense10000-4        20.0k ± 0%      0.0k ± 0%  -99.99%  (p=0.000 n=4+11)
UnmarshalDense10-4          8.00 ± 0%      1.00 ± 0%  -87.50%  (p=0.000 n=4+11)
UnmarshalDense100-4         8.00 ± 0%      1.00 ± 0%  -87.50%  (p=0.000 n=4+11)
UnmarshalDense1000-4        8.00 ± 0%      1.00 ± 0%  -87.50%  (p=0.000 n=4+11)
UnmarshalDense10000-4       8.00 ± 0%      1.00 ± 0%  -87.50%  (p=0.000 n=4+11)
MarshalVector10-4           24.0 ± 0%       2.0 ± 0%  -91.67%  (p=0.000 n=4+11)
MarshalVector100-4           204 ± 0%         2 ± 0%  -99.02%  (p=0.000 n=4+11)
MarshalVector1000-4        2.00k ± 0%     0.00k ± 0%  -99.90%  (p=0.000 n=4+11)
MarshalVector10000-4       20.0k ± 0%      0.0k ± 0%  -99.99%  (p=0.000 n=4+11)
UnmarshalVector10-4         6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.000 n=4+11)
UnmarshalVector100-4        6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.000 n=4+11)
UnmarshalVector1000-4       6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.000 n=4+11)
UnmarshalVector10000-4      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.000 n=4+11)
sbinet commented 8 years ago

also added io.EOF -> io.ErrUnexpectedEOF for ncols,nrows. no noticeable degradation wrt original improvement.

sbinet commented 8 years ago

done. wrt the first version of the optimization, I now get:

name                    old time/op    new time/op    delta
MarshalDense10-4           668ns ± 2%     280ns ± 2%  -58.08%          (p=0.008 n=5+5)
MarshalDense100-4         3.99µs ± 1%    1.80µs ± 1%  -54.83%          (p=0.008 n=5+5)
MarshalDense1000-4        36.3µs ± 1%    16.4µs ± 1%  -54.77%          (p=0.008 n=5+5)
MarshalDense10000-4        349µs ± 1%     147µs ± 1%  -57.69%          (p=0.008 n=5+5)
UnmarshalDense10-4         425ns ± 1%     262ns ± 3%  -38.28%          (p=0.008 n=5+5)
UnmarshalDense100-4       3.11µs ± 1%    1.68µs ± 1%  -46.04%          (p=0.008 n=5+5)
UnmarshalDense1000-4      29.2µs ± 1%    15.0µs ± 2%  -48.69%          (p=0.008 n=5+5)
UnmarshalDense10000-4      281µs ± 2%     137µs ± 5%  -51.43%          (p=0.008 n=5+5)
MarshalVector10-4          638ns ± 1%     262ns ± 2%  -58.97%          (p=0.008 n=5+5)
MarshalVector100-4        3.89µs ± 1%    1.76µs ± 2%  -54.82%          (p=0.008 n=5+5)
MarshalVector1000-4       36.0µs ± 1%    16.1µs ± 3%  -55.40%          (p=0.008 n=5+5)
MarshalVector10000-4       348µs ± 1%     147µs ± 2%  -57.82%          (p=0.008 n=5+5)
UnmarshalVector10-4        411ns ± 1%     247ns ± 3%  -40.00%          (p=0.008 n=5+5)
UnmarshalVector100-4      3.13µs ± 4%    1.65µs ± 2%  -47.12%          (p=0.008 n=5+5)
UnmarshalVector1000-4     29.6µs ± 3%    15.1µs ± 2%  -48.95%          (p=0.008 n=5+5)
UnmarshalVector10000-4     290µs ± 2%     139µs ± 2%  -52.26%          (p=0.008 n=5+5)

name                    old alloc/op   new alloc/op   delta
MarshalDense10-4            208B ± 0%       96B ± 0%  -53.85%          (p=0.008 n=5+5)
MarshalDense100-4         1.01kB ± 0%    0.90kB ± 0%  -11.11%          (p=0.008 n=5+5)
MarshalDense1000-4        8.30kB ± 0%    8.19kB ± 0%   -1.35%          (p=0.008 n=5+5)
MarshalDense10000-4       82.0kB ± 0%    81.9kB ± 0%   -0.14%          (p=0.008 n=5+5)
UnmarshalDense10-4         80.0B ± 0%     80.0B ± 0%     ~     (all samples are equal)
UnmarshalDense100-4         896B ± 0%      896B ± 0%     ~     (all samples are equal)
UnmarshalDense1000-4      8.19kB ± 0%    8.19kB ± 0%     ~     (all samples are equal)
UnmarshalDense10000-4     81.9kB ± 0%    81.9kB ± 0%     ~     (all samples are equal)
MarshalVector10-4           208B ± 0%       96B ± 0%  -53.85%          (p=0.008 n=5+5)
MarshalVector100-4        1.01kB ± 0%    0.90kB ± 0%  -11.11%          (p=0.008 n=5+5)
MarshalVector1000-4       8.30kB ± 0%    8.19kB ± 0%   -1.35%          (p=0.008 n=5+5)
MarshalVector10000-4      82.0kB ± 0%    81.9kB ± 0%   -0.14%          (p=0.008 n=5+5)
UnmarshalVector10-4        80.0B ± 0%     80.0B ± 0%     ~     (all samples are equal)
UnmarshalVector100-4        896B ± 0%      896B ± 0%     ~     (all samples are equal)
UnmarshalVector1000-4     8.19kB ± 0%    8.19kB ± 0%     ~     (all samples are equal)
UnmarshalVector10000-4    81.9kB ± 0%    81.9kB ± 0%     ~     (all samples are equal)

name                    old allocs/op  new allocs/op  delta
MarshalDense10-4            2.00 ± 0%      1.00 ± 0%  -50.00%          (p=0.008 n=5+5)
MarshalDense100-4           2.00 ± 0%      1.00 ± 0%  -50.00%          (p=0.008 n=5+5)
MarshalDense1000-4          2.00 ± 0%      1.00 ± 0%  -50.00%          (p=0.008 n=5+5)
MarshalDense10000-4         2.00 ± 0%      1.00 ± 0%  -50.00%          (p=0.008 n=5+5)
UnmarshalDense10-4          1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
UnmarshalDense100-4         1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
UnmarshalDense1000-4        1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
UnmarshalDense10000-4       1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
MarshalVector10-4           2.00 ± 0%      1.00 ± 0%  -50.00%          (p=0.008 n=5+5)
MarshalVector100-4          2.00 ± 0%      1.00 ± 0%  -50.00%          (p=0.008 n=5+5)
MarshalVector1000-4         2.00 ± 0%      1.00 ± 0%  -50.00%          (p=0.008 n=5+5)
MarshalVector10000-4        2.00 ± 0%      1.00 ± 0%  -50.00%          (p=0.008 n=5+5)
UnmarshalVector10-4         1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
UnmarshalVector100-4        1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
UnmarshalVector1000-4       1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
UnmarshalVector10000-4      1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)

and, against the pre-optimization version:

name                    old time/op    new time/op    delta
MarshalDense10-4          4.07µs ± 3%    0.28µs ± 2%   -93.11%  (p=0.008 n=5+5)
MarshalDense100-4         33.2µs ± 1%     1.8µs ± 1%   -94.57%  (p=0.008 n=5+5)
MarshalDense1000-4         325µs ± 3%      16µs ± 1%   -94.94%  (p=0.008 n=5+5)
MarshalDense10000-4       3.24ms ± 2%    0.15ms ± 1%   -95.45%  (p=0.008 n=5+5)
UnmarshalDense10-4        1.43µs ± 2%    0.26µs ± 3%   -81.69%  (p=0.008 n=5+5)
UnmarshalDense100-4       6.26µs ± 0%    1.68µs ± 1%   -73.17%  (p=0.016 n=4+5)
UnmarshalDense1000-4      53.5µs ± 4%    15.0µs ± 2%   -72.01%  (p=0.008 n=5+5)
UnmarshalDense10000-4      488µs ± 1%     137µs ± 5%   -72.01%  (p=0.008 n=5+5)
MarshalVector10-4         3.80µs ± 1%    0.26µs ± 2%   -93.12%  (p=0.008 n=5+5)
MarshalVector100-4        32.9µs ± 1%     1.8µs ± 2%   -94.65%  (p=0.008 n=5+5)
MarshalVector1000-4        323µs ± 1%      16µs ± 3%   -95.03%  (p=0.008 n=5+5)
MarshalVector10000-4      3.20ms ± 1%    0.15ms ± 2%   -95.41%  (p=0.008 n=5+5)
UnmarshalVector10-4       1.23µs ± 1%    0.25µs ± 3%   -80.01%  (p=0.008 n=5+5)
UnmarshalVector100-4      6.14µs ± 2%    1.65µs ± 2%   -73.06%  (p=0.008 n=5+5)
UnmarshalVector1000-4     52.1µs ± 2%    15.1µs ± 2%   -71.01%  (p=0.008 n=5+5)
UnmarshalVector10000-4     490µs ± 2%     139µs ± 2%   -71.69%  (p=0.008 n=5+5)

name                    old alloc/op   new alloc/op   delta
MarshalDense10-4            400B ± 0%       96B ± 0%   -76.00%  (p=0.008 n=5+5)
MarshalDense100-4         2.64kB ± 0%    0.90kB ± 0%   -66.06%  (p=0.008 n=5+5)
MarshalDense1000-4        24.3kB ± 0%     8.2kB ± 0%   -66.34%  (p=0.008 n=5+5)
MarshalDense10000-4        242kB ± 0%      82kB ± 0%      ~     (p=0.079 n=4+5)
UnmarshalDense10-4          272B ± 0%       80B ± 0%   -70.59%  (p=0.008 n=5+5)
UnmarshalDense100-4       1.90kB ± 0%    0.90kB ± 0%   -52.94%  (p=0.008 n=5+5)
UnmarshalDense1000-4      16.5kB ± 0%     8.2kB ± 0%   -50.34%  (p=0.008 n=5+5)
UnmarshalDense10000-4      164kB ± 0%      82kB ± 0%   -50.03%  (p=0.008 n=5+5)
MarshalVector10-4           384B ± 0%       96B ± 0%   -75.00%  (p=0.008 n=5+5)
MarshalVector100-4        2.62kB ± 0%    0.90kB ± 0%   -65.85%  (p=0.008 n=5+5)
MarshalVector1000-4       24.3kB ± 0%     8.2kB ± 0%   -66.32%  (p=0.008 n=5+5)
MarshalVector10000-4       242kB ± 0%      82kB ± 0%      ~     (p=0.079 n=4+5)
UnmarshalVector10-4         256B ± 0%       80B ± 0%   -68.75%  (p=0.008 n=5+5)
UnmarshalVector100-4      1.89kB ± 0%    0.90kB ± 0%   -52.54%  (p=0.008 n=5+5)
UnmarshalVector1000-4     16.5kB ± 0%     8.2kB ± 0%   -50.29%  (p=0.008 n=5+5)
UnmarshalVector10000-4     164kB ± 0%      82kB ± 0%   -50.03%  (p=0.008 n=5+5)

name                    old allocs/op  new allocs/op  delta
MarshalDense10-4            26.0 ± 0%       1.0 ± 0%   -96.15%  (p=0.008 n=5+5)
MarshalDense100-4            206 ± 0%         1 ± 0%   -99.51%  (p=0.008 n=5+5)
MarshalDense1000-4         2.01k ± 0%     0.00k ± 0%   -99.95%  (p=0.008 n=5+5)
MarshalDense10000-4        20.0k ± 0%      0.0k ± 0%  -100.00%  (p=0.008 n=5+5)
UnmarshalDense10-4          8.00 ± 0%      1.00 ± 0%   -87.50%  (p=0.008 n=5+5)
UnmarshalDense100-4         8.00 ± 0%      1.00 ± 0%   -87.50%  (p=0.008 n=5+5)
UnmarshalDense1000-4        8.00 ± 0%      1.00 ± 0%   -87.50%  (p=0.008 n=5+5)
UnmarshalDense10000-4       8.00 ± 0%      1.00 ± 0%   -87.50%  (p=0.008 n=5+5)
MarshalVector10-4           24.0 ± 0%       1.0 ± 0%   -95.83%  (p=0.008 n=5+5)
MarshalVector100-4           204 ± 0%         1 ± 0%   -99.51%  (p=0.008 n=5+5)
MarshalVector1000-4        2.00k ± 0%     0.00k ± 0%   -99.95%  (p=0.008 n=5+5)
MarshalVector10000-4       20.0k ± 0%      0.0k ± 0%  -100.00%  (p=0.008 n=5+5)
UnmarshalVector10-4         6.00 ± 0%      1.00 ± 0%   -83.33%  (p=0.008 n=5+5)
UnmarshalVector100-4        6.00 ± 0%      1.00 ± 0%   -83.33%  (p=0.008 n=5+5)
UnmarshalVector1000-4       6.00 ± 0%      1.00 ± 0%   -83.33%  (p=0.008 n=5+5)
UnmarshalVector10000-4      6.00 ± 0%      1.00 ± 0%   -83.33%  (p=0.008 n=5+5)
kortschak commented 8 years ago

LGTM

We need to consider the impact of the streaming API on this gain in that PR.