JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.85k stars 5.49k forks source link

one(BigInt):BigInt iterates much slower than one(BigInt):one(BigInt):BigInt #39008

Open Seelengrab opened 3 years ago

Seelengrab commented 3 years ago

This came up on Slack.

julia> function ∑2(n)
                  Val = zero(n)
                  for i in 1:n
                      Base.GMP.MPZ.add!(Val, i)
                  end
                  Val
              end;

julia> function ∑3(n)
                         val = zero(n)
                         for i in one(n):one(n):n
                             Base.GMP.MPZ.add!(val, i)
                         end
                         val
               end
∑3 (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime ∑2(big(5_000_000))
  2.863 s (40000005 allocations: 839.23 MiB)
12500002500000

julia> @btime ∑3(big(5_000_000))
  744.614 ms (10000019 allocations: 228.88 MiB)
12500002500000

As far as I can tell, both one(n):n as well as one(n):one(n):n call the same methods, they just compile differently. The UnitRange{BigInt} compiles to much less efficient code though compared to the StepRange{BigInt,BigInt}.

Moreover, there may be even more speed to be gained here - does iterating over a range of BigInt necessarily have to allocate each one that's produced or can we pass the same BigInt from the first iteration and Base.GMP.MPZ.add! the step? I have not investigated whether the remaining allocations truly stem from this, but I can't imagine anything else playing a role here.

oscardssmith commented 3 years ago

In general, you can't mutate the range. One example would be putting everything in the range into a vector

Seelengrab commented 3 years ago

I'm not talking about mutating the range, I'm talking about mutating the iteration state.

Seelengrab commented 3 years ago

Basically doing this:

julia> function ∑4(n)
                  Val = zero(n)
                  c = one(n)
                  tmp = one(n)
                  while c <= n
                      Base.GMP.MPZ.add!(Val, c)
                      Base.GMP.MPZ.add!(c, tmp)
                  end
                  Val
              end
∑4 (generic function with 1 method)

julia> @btime ∑4(big(5_000_000))
  94.541 ms (10 allocations: 176 bytes)
12500002500000

Except that tmp would be the state returned by iterate on Ranges of BigInt

Seelengrab commented 3 years ago

If mutating the state is fine and doesn't violate any invariants, this seems like an easy 33x speedup for iterating over BigInt ranges and I'd be happy to make a PR.

thofma commented 3 years ago

There are some packages providing mutable arithmetic to BigInt (https://github.com/jump-dev/MutableArithmetics.jl). The changes here have the potential for some breakage in this regard (@blegat)

I think this makes BigInt more dangerous to use. Next week someone decides to let zero or one of BigInt return some global value (which is valid according to the current immutable behavior of BigInt). You would need to change your code quite a lot, on particular you would have to check for aliasing etc.

Seelengrab commented 3 years ago

This issue is not about making every arithmetic operation on BigInts mutating, it's about mutating the iteration state returned by iterate on Ranges with type BigInt. At the moment, these methods fall back to the generic iteration mechanisms, which use + to advance the state, thereby allocating a new BigInt each time iterate is called. If it's ok to mutate the existing state instead of returning a new BigInt for state, there would be a significant speedup for iterating over those ranges.

It's basically the difference of specialising iterate of Ranges of BigInt on BigInt or not. I'm aware that BigInt pretends to be immutable while in reality it's not.

These are the methods currently used for the ranges in question:

iterate(r::OrdinalRange) = isempty(r) ? nothing : (first(r), first(r))

function iterate(r::OrdinalRange{T}, i) where {T}
    @_inline_meta
    i == last(r) && return nothing
    next = convert(T, i + step(r)) # can we avoid the allocation here and `add!` instead?
    (next, next)
end

I'm not sure how this would be done best though, since simply dropping in Base.GMP.MPZ.add! would still require a copy on the returned value 🤔 But maybe just dropping in the explicit add! and returning (deepcopy(i), i) is already a bunch faster.

blegat commented 3 years ago

It makes sense to mutate the state/iterates when they are not used outside the loop as it provides significant speedup for mutable types. I have added a mutating version of StepRange here: https://github.com/jump-dev/MutableArithmetics.jl/pull/69

thofma commented 3 years ago

These are the methods currently used for the ranges in question:

iterate(r::OrdinalRange) = isempty(r) ? nothing : (first(r), first(r))

function iterate(r::OrdinalRange{T}, i) where {T}
    @_inline_meta
    i == last(r) && return nothing
    next = convert(T, i + step(r)) # can we avoid the allocation here and `add!` instead?
    (next, next)
end

I'm not sure how this would be done best though, since simply dropping in Base.GMP.MPZ.add! would still require a copy on the returned value 🤔 But maybe just dropping in the explicit add! and returning (deepcopy(i), i) is already a bunch faster.

I doubt it will be faster, since both versions allocate one BigInt per iterate call. This is also the minimum of allocations if you don't want to change the result of collect(big(1):big(2)).

But you could give it a try. Then you also have to be careful with aliasing. The naive version would be broken for a = big(1); a:a:big(10). You would need to do some alias checking or deepcopying at the beginning.

Seelengrab commented 3 years ago

I'd be fine with deepcopy at creation of the range, I think that's a fair tradeoff to make for really fast iteration 🤷‍♂️

The next step would be to specialize collect for BigInt ranges as well, to make sure it uses deepcopy, while regular iteration would not - that's the main point I'm not sure about. Would probably need a PR and a package eval to check if it would break things in the real world.

collect on ranges of BigInt is a dubious thing anyway - for the sizes of BigInt that are practical and useful for number theory, you really don't want to use more memory than absolutely necessary, so in my opinion collect on them is not a good idea in the first place and more a "you could do it" because our methods are so generic.