kimikage commented 4 years ago

This is a sequel to PR #213. This specializes most of the multiplication for Fixed and avoids floating point operations.

A major change is that the rounding mode is changed from RoundNearestTiesUp to RoundNearest. The existing RoundNearestTiesUp and RoundDown (the latter was noted in a comment) modes are now supported by the new unexported function mul_with_rounding.

Unlike multiplication for Normed, the wrapping arithmetic is the default for Fixed.

This PR also improves rem.

Fixes #173, Fixes #219

codecov[bot] commented 4 years ago

Codecov Report

Merging #220 into master will increase coverage by 0.28%. The diff coverage is 93.93%.

@@            Coverage Diff             @@
##           master     #220      +/-   ##
==========================================
+ Coverage   91.01%   91.29%   +0.28%     
==========================================
  Files           6        6              
  Lines         534      563      +29     
==========================================
+ Hits          486      514      +28     
- Misses         48       49       +1

Impacted Files	Coverage Δ
src/fixed.jl	`95.27% <93.93%> (+0.37%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 134646f...9e7c522. Read the comment docs.

kimikage commented 4 years ago

Benchmark

See Normed version

x_q0f7  = collect(rand(Q0f7,  1000, 1000)); y_q0f7  = collect(rand(Q0f7,  1000, 1000));
x_q0f15 = collect(rand(Q0f15, 1000, 1000)); y_q0f15 = collect(rand(Q0f15, 1000, 1000));
x_q3f4  = collect(rand(Q3f4,  1000, 1000)); y_q3f4  = collect(rand(Q3f4,  1000, 1000));
x_q3f12 = collect(rand(Q3f12, 1000, 1000)); y_q3f12 = collect(rand(Q3f12, 1000, 1000));
z_q3f4  = Q3f4.( rand(1000, 1000));
z_q3f12 = Q3f12.(rand(1000, 1000));

julia> @btime $x_q0f7 .* $y_q0f7;
  96.199 μs (2 allocations: 976.70 KiB)

julia> @btime $x_q0f15 .* $y_q0f15;
  796.499 μs (2 allocations: 1.91 MiB)

julia> @btime saturating_mul.($x_q3f4, $y_q3f4);
  155.601 μs (2 allocations: 976.70 KiB)

julia> @btime saturating_mul.($x_q3f12, $y_q3f12);
  909.100 μs (2 allocations: 1.91 MiB)

julia> @btime checked_mul.($x_q3f4, $z_q3f4);
  1.575 ms (2 allocations: 976.70 KiB)

julia> @btime checked_mul.($x_q3f12, $z_q3f12);
  1.881 ms (2 allocations: 1.91 MiB)

rem (cf. https://github.com/JuliaMath/FixedPointNumbers.jl/issues/219#issue-682426339)

julia> x_f32 = rand(Float32, 1000, 1000);

julia> @btime $x_f32 .% Q0f7;
  762.600 μs (2 allocations: 976.70 KiB)

cf. v0.8.4 (RoundNearestTiesUp version)

julia> @btime $x_q0f7 .* $y_q0f7;
  77.900 μs (2 allocations: 976.70 KiB)

julia> @btime $x_q0f15 .* $y_q0f15;
  758.600 μs (2 allocations: 1.91 MiB)

I feel we can improve it a bit more. The Fixed{Int16} multiplication clearly has a speed problem, considering the performance of Fixed{Int8} and Normed{UInt16}, but the cause is not clear. However, I don't plan to work on optimizing it as a priority.

kimikage commented 4 years ago

As an additional note for later readers, the default *, i.e. wrapping_mul, is slightly slower due to the change in rounding mode. Instead, at least for Fixed{Int8} and Fixed{Int16}, the results will agree with the float.

As for the rem, the speed is greatly improved by allowing incorrect results from overflow and rounding as well as Normed.

In order to address the operations related to division, I will first merge this.

JuliaMath / FixedPointNumbers.jl

Specialize multiplication for `Fixed` #220

Codecov Report

Benchmark