Open Ktrompfl opened 4 months ago
Yes and no. Try using Native.GF(2)
(once Nemo is loaded).
Thanks for your response. Apart from being slightly faster on addition and not causing any allocations at all, Native.GF
still appears to be a magnitude slower on multiplication when compared to AbstractAlgebra.GF
:
julia> @btime begin
K = Native.GF(2)
a = one(K)
for _ = 1:1_000_000
a *= a
end
end
5.737 ms (0 allocations: 0 bytes)
julia> @btime begin
K = Native.GF(2)
a = one(K)
for _ = 1:1_000_000
a += a
end
end
44.278 ns (0 allocations: 0 bytes)
Hm, yes, it is quite bad, but a bit puzzling. @fieker: The AbstractAlgebra code is not that sophisticated in comparison to Nemo/flint, which goes through n_mulmod_preinv
. Is this just the overhead of the function call or is there something else going on?
my guess is 2 things:
My guess is that Native.GF(2) will outperform AA on tasks taking place in C (in flint), however "naive" native julia code will favour AA by a wide margin. Furthermore, each "+" in Flint is a function call, while AA is handled directly by the compiler.... I guess we can/ should implement the Native.GF(p) arithmetic directly ourselves
I withdraw my comments. fpFieldElem is usng preinv and is immutable julia. is calling into C however, so the call mihght be much more expensive (at least it should!!) thay checking if ab is even or odd. Has anyone done the timings on 30 bit or 60 bit primes as well? The Native code is not at all optimized for small p
I tried 30 bits and got the same timings. We could in fact just copy the code from AA to Nemo.
@albinahlback since you interested in flint being fast, do you have an idea what could be going on? How long does
julia> @btime begin
K = Native.GF(1099511627791)
a = one(K)
for _ = 1:1_000_000
a *= a
end
end
5.737 ms (0 allocations: 0 bytes)
take for you and how long does that take for you with an equivalent C program using nmod
? It is clear that the ccall
on the julia side is not for free, but I wonder how expensive it is.
Do you know if AbstractAlgebra-stuff is being inlined or not?
If someone would be able to give me the generated assembly for each case, it would be pretty easy to see why the difference is so big.
Hm, I tried to produce the assembly code, but I noticed something strange. It seems that there is something dodgy going one due to some problems in the benchmark setup.
@Ktrompfl can you reproduce the following?
function f_with_return(K)
a = one(K)
for _ = 1:1_000_000
a *= a
end
return a
end
function f_without_return(K)
a = one(K)
for _ = 1:1_000_000
a *= a
end
end
K_AA = AbstractAlgebra.GF(1099511627791)
K_Nemo = Nemo.Native.GF(1099511627791)
I then get
julia> t_AA = @belapsed(f_with_return(K_AA)), @belapsed(f_without_return(K_AA))
(0.005825292, 0.000312042)
julia> t_Nemo = @belapsed(f_with_return(K_Nemo)), @belapsed(f_without_return(K_Nemo))
(0.012017084, 0.012015667)
So clearly the compiler is doing something "clever" in the AA case, since the function always returns nothing
.
Looks like it is 2x slower, which is not "too bad".
With the setup above I get
julia> t_AA = @belapsed(f_with_return(K_AA)), @belapsed(f_without_return(K_AA))
(0.003119424, 0.000221444)
julia> t_Nemo = @belapsed(f_with_return(K_Nemo)), @belapsed(f_without_return(K_Nemo))
(0.005218629, 0.005237672)
which should be fine, whereas with K_Nemo2 = Nemo.GF(1099511627791)
I get
julia> t_Nemo2 = @belapsed(f_with_return(K_Nemo2)), @belapsed(f_without_return(K_Nemo2))
(0.05128647, 0.051381936)
Using Galois fields from
Nemo
instead ofAbstractAlgebra
heavily impacts the performance of basic arithmetic operations due to allocations for every single operation:I assume this is a byproduct of calling native libraries, but still very unfortunate.