Closed b-mehta closed 6 years ago
Cool! I'll take a look by the end of the week.
Implement Pollard's rho algorithm for discrete logs with prime moduli (#129 would be helpful for this)
I pushed solveLinear
into master
.
Great, thanks!
The Bach reduction gave a large speedup for prime powers: the case 2^n = 7 mod 3^20
went from around 41ms to 3.5µs and 3^n = 17 mod 5^16
went from 549ms to 3.8µs. Range benchmarks did not change significantly, which is expected since prime powers are rare compared to primes in general. I'll modify benchmarks for prime powers now, using larger powers to make up for this speedup.
Cool!
I've added Pollard's rho but on benchmarking, it doesn't have the speedup I expected. Specifically, for moduli below around 10^8 it is slower than the baby-step giant-step method, but works faster for large moduli. For cases near 10^6, the BSGS method took 27.27ms while Pollard's rho took 43.15ms while for cases near 10^10, BSGS took 8.61s and Pollard's rho took 4.06s.
While it's not surprising that Pollard's rho is eventually faster, I'm surprised the cutoff is so high. Is there some obvious deficiency in my rho algorithm that makes it perform badly on small inputs? @Bodigrim
(Note the rho implementation isn't perfect right now - it currently only works probabilistically but the probability of error is so small that it succeeds on all the test cases and benchmark cases. The functions starter
and begin
are part of the fix for this, but right now they appear redundant, and when failure occurs, the error will be from calling head
on an empty list in line 79. I'll fix in later commits.)
There are two sources of inefficiency: redundant divisions in step
and allocating lazy triples on heap.
Here is my attempt:
data Pollard = Pollard !Integer !Integer !Integer
discreteLogarithmPrime :: Integer -> Integer -> Integer -> Natural
discreteLogarithmPrime p a b = fromInteger $ head $ filter check $ begin (starter 0 0)
where
n = p-1 -- order of the cyclic group
halfN = n `quot` 2
mul2 m = if m < halfN then m `shiftL` 1 else m `shiftL` 1 - n
step (Pollard xi ai bi) = case xi `rem` 3 of
0 -> Pollard (xi*xi `rem` p) (mul2 ai) (mul2 bi)
1 -> Pollard (a*xi `rem` p) (ai+1) bi
_ -> Pollard (b*xi `rem` p) ai (bi+1)
starter x y = Pollard (powMod a x n * powMod b y n `rem` n) x y
begin t = go (step t) (step (step t))
go :: Pollard -> Pollard -> [Integer]
go tort@(Pollard xi ai bi) hare@(Pollard x2i a2i b2i)
| xi == x2i = solveLinear' n (bi - b2i) (ai - a2i)
| otherwise = go (step tort) (step (step hare))
check t = powMod a t p == b
One can check by dumping GHC Core {-# OPTIONS_GHC -ddump-simpl -dsuppress-all -dno-suppress-type-signatures #-}
that no Pollard
thunk is actually allocated and all computations have reduced to naked Integer
. Your version of discreteLogarithmPrime
tended to allocate and deallocate (_, _, _)
constructors here and there.
Before:
Discrete logarithm/range/100 cases near 10000 mean 15.01 ms ( +- 2.861 ms )
Discrete logarithm/range/100 cases near 1000000 mean 124.5 ms ( +- 26.73 ms )
Discrete logarithm/range/100 cases near 100000000 mean 976.0 ms ( +- 87.84 ms )
Now:
Discrete logarithm/range/100 cases near 10000 mean 9.049 ms ( +- 1.375 ms )
Discrete logarithm/range/100 cases near 1000000 mean 78.45 ms ( +- 3.424 ms )
Discrete logarithm/range/100 cases near 100000000 mean 746.1 ms ( +- 33.00 ms )
I'm not convinced that allocation of tuples is an issue here, since the Core output shows that unboxed tuples are being used, which has no heap overhead, but I'll investigate further. In fact, your use of the Pollard
type also becomes unboxed tuples in Core, and the code for step
appears identical in both versions.
Your code doesn't succeed when I test it - there are many missing rem
calls which cause it to fail. I suspect the improvement in the range benchmarks is due to these calls being omitted when they are truly necessary.
Sorry, it is my fault; maybe wrong compilation parameters. I can confirm: your code does not allocate triples as well.
Your code doesn't succeed when I test it - there are many missing rem calls which cause it to fail.
What do you mean? It seems passing tests in my environment.
What do you mean? It seems passing tests in my environment.
Hm this is curious, the first four benchmarks all fail with head of an empty list with your code, but succeed with mine.
Weird. This is what I see:
Discrete logarithm/individual case/5ⁿ == 8 mod 10^9 + 7 mean 42.95 ms ( +- 1.763 ms )
Discrete logarithm/individual case/2ⁿ == 7 mod 3^500 mean 231.4 μs ( +- 25.78 μs )
Discrete logarithm/individual case/2ⁿ == 3 mod 10^11 + 3 mean 176.6 ms ( +- 7.414 ms )
Discrete logarithm/individual case/3ⁿ == 17 mod 5^350 mean 288.4 μs ( +- 35.09 μs )
Very strange, on running stack bench --ba '--match prefix Disc -q'
I get this error:
arithmoi-0.7.0.0: benchmarks
Running 1 benchmarks...
Benchmark criterion: RUNNING...
criterion: Prelude.head: empty list
benchmarking Discrete logarithm/individual case/5ⁿ == 8 mod 10^9 + 7 ... Benchmark criterion: ERROR
-- While building custom Setup.hs for package arithmoi-0.7.0.0 using:
/Users/bmehta/.stack/setup-exe-cache/x86_64-osx/Cabal-simple_mPHDZzAJ_2.0.1.0_ghc-8.2.2 --builddir=.stack-work/dist/x86_64-osx/Cabal-2.0.1.0 bench criterion "--benchmark-options=--match prefix Disc -q"
Process exited with code: ExitFailure 1
I pushed my changes onto CI: it fails with GHC 8.2 and earlier, but succeedes with GHC 8.4 and newer. Very odd. Unfortunately, I am leaving for vacation and won't be able to investigate until the end of next week.
No worries - thanks for your help regardless.
Got it. Prior to GHC 8.4 both powModInteger
and recipModInteger
were broken for a negative first argument (https://ghc.haskell.org/trac/ghc/ticket/14085).
That is why after my modification solveLinear'
started to return weird results in GHC 8.2 and led to an empty filter check $ begin (starter 0 0)
. I have pushed fixes for solveLinear'
into master
and now my implementation works fine for all supported range of GHCs (https://travis-ci.org/Bodigrim/arithmoi/builds/423235835). 20-30% improvement in benchmarks is observed, same to reported above. I suspect it is purely due to changes in step
function.
Could you please rebase your branch upon the latest master
?
It passes tests but I'm not fully convinced of correctness - you've removed the rem
after ai+1
and the like, but the remainder still should be taken after these additions, since they are likely to get larger than the modulus. In addition, the expression for mul2
assumes that the input m
is reduced modulo n
, but this is not guaranteed since the rem
is no longer taken after ai+1
.
On second thoughts, these shouldn't be an issue since the residue mod n
remains correct. It is curious however that allowing the values to grow larger instead of restricting them to remain below n
gives a faster function.
I've changed some things, removing some of the rem
calls. When I benchmark, using m * 2
instead of shiftL
is noticeably faster, so I've kept it. Since Pollard
reduces to identical code as the tuples, I've left that as is also.
The calls to mod
in line 92 may not be strictly necessary - I originally kept them because solveLinear'
was not exported and so I wanted to make sure the input was as expected. Also, it is likely that the mod
call will reduce arguments into a smaller range, so the function may run faster, but I'm happy to remove them if necessary.
solveLinear'
should work fine for unreduced arguments. Even without reduction by modulo bi-b2i
and ai-b2i
are O(n) with probability close to 1, so there should not be any performance hit. I suggest to remove mod
from line 92 for the sake of clarity.
Awesome. I greatly appreciate your work.
Does it make sense to split this PR into two? The first one about discrete log (which is ready to merge, up to my understanding) and the second (future) one about Dirichlet characters. What is your opinion?
I think this is a good idea - in fact I'd like to split it into three PRs: The currently implemented work on discrete logs (which is ready to merge), possible speed ups on discrete logs (including Pollig-Hellman, which I'm investigating at the minute) and Dirichlet characters.
Something worth pointing out is that the algorithm is probabilistic, as I mentioned in the most recent commit, and so has a (negligible) probability of failing. Currently three starting inputs are used, which cubes the failure probability making it even more negligible, but still non-zero. Possible fixes are to allow all starting inputs on failure, so that the function is guaranteed not to fail, or just to use a small amount and hope for the best (cf line 87 in the last commit). I'm testing how likely failure actually is in my branch here - so far for primes less than 7000, all possible instances of pollard's rho succeed by a super naive brute force search What is your opinion on this?
(Unrelatedly, hope you enjoyed your vacation!)
It would be nice to find via brute force at least one example, when a) implementation with only one seed fails to find an answer, b) implementation with two seeds fails, c) current implementation with three seeds fails. And add them to tests.
Can we resort to BSGS, when Pollard rho has failed with three seeds?
That would be nice, but I fear that the probability of failure is genuinely tiny - testing all primes less than 7000 represents over 5 billion instances of the discrete log problem, and no failure has yet occured.
Possibly, but BSGS uses IntMaps internally and right now BSGS is used anyway when the inputs are less than 10^8 so in that scenario, the values may not fit within an Int. In addition, if Maps are used instead, it is possible that massive maps are made and create memory issues instead. I'd be more comfortable allowing more seeds instead - since the performance hit of this will almost certainly be smaller than the performance hit of BSGS.
I may be wrong, but my impression is that your implementation of Pollard is not probabilistic. AFAIU we say that Pollard algorithm failed for a given seed when the algorithm found a collision with bi == b2i
, for which solveLinear'
returns a full, unrestricted range. But in our case we just resort to brute force check
-ing.
If bi == b2i
, then solveLinear'
attempts to solve 0x + a = 0
, and returns the empty list, so we immediately fail.
Edit: My mistake, we additionally have that ai == a2i
, so brute force check
ing does occur.
Yep. For small arguments it happens quite often, which can be traced as follows:
go tort@(xi,ai,bi) hare@(x2i,a2i,b2i)
| xi == x2i
, (bi - b2i) `mod` n == 0
, traceShow (p, a, b) False
= undefined
| xi == x2i = solveLinear' n (bi - b2i) (ai - a2i)
| otherwise = go (step tort) (step (step hare))
[(a, b) | a <- [0..n], b <- [0..n]]
.solveLinear'
only if gcd (bi - b2i) n
is small (below sqrt n
?). Otherwise take the next seed.Right, I was doing exactly this. That approach sounds reasonable. Thanks for this catch!
This should be good to go. Note that Math.NumberTheory.Moduli.Equations
exports solveLinear'
- hopefully this is acceptable @Bodigrim.
Note that Math.NumberTheory.Moduli.Equations exports solveLinear' - hopefully this is acceptable
I've thought about it and decided to revert this new export in 473a5ef25eadc4a0d033a691e4e54d24dae5d5fe. An additional public function needs documentation, tests and is a commitment to support its interface. Being public, solveLinear'
should deal somehow with negative modulo and zero modulo, it may also be reasonable to make it working for any Integral a
instead of Integer
only. That said, from the maintainer's viewpoint it is preferable to use solveLinear
only.
Fair enough. Is an acceptable compromise to assert that the modulo must be positive, and that the precondition is not checked, similar to integerSquareRoot'
from M.NT.Powers.Squares
? That way users can still use solveLinear'
for Integer
s without having to wrap values up using someNatVal
.
Unchecked preconditions are the root of all evil :) They are the worst plague of arithmoi
API in its current state.
Currently I am tying up loose ends for an upcoming release; I'll be glad to return to solveLinear'
question afterwards. Please raise an issue to facilitate discussion.
Unchecked preconditions are the root of all evil :) They are the worst plague of
arithmoi
API in its current state.
I see! Fair enough - I have no real complaints about the state of solveLinear'
as it is.
Discrete logarithms and Dirichlet characters. Resolves #88. In roughly this order, this is my plan for full efficient implementation of both.
MultMod
type to represent elements of the multiplicative group mod m, with Monoid instance. AddedPrimitiveRoot
type to represent primitive roots of the (cyclic) multiplicative group. Note these two types together makediscreteLogarithm
a total function. Partial constructors have been added and deconstructors. (I would like feedback on the nameMultMod
, I feel a better name is possible.)