Bodigrim / arithmoi

Number theory: primes, arithmetic functions, modular computations, special sequences
http://hackage.haskell.org/package/arithmoi
MIT License
147 stars 40 forks source link

Discrete log #130

Closed b-mehta closed 6 years ago

b-mehta commented 6 years ago

Discrete logarithms and Dirichlet characters. Resolves #88. In roughly this order, this is my plan for full efficient implementation of both.

Bodigrim commented 6 years ago

Cool! I'll take a look by the end of the week.

Bodigrim commented 6 years ago

Implement Pollard's rho algorithm for discrete logs with prime moduli (#129 would be helpful for this)

I pushed solveLinear into master.

b-mehta commented 6 years ago

Great, thanks!

b-mehta commented 6 years ago

The Bach reduction gave a large speedup for prime powers: the case 2^n = 7 mod 3^20 went from around 41ms to 3.5µs and 3^n = 17 mod 5^16 went from 549ms to 3.8µs. Range benchmarks did not change significantly, which is expected since prime powers are rare compared to primes in general. I'll modify benchmarks for prime powers now, using larger powers to make up for this speedup.

Bodigrim commented 6 years ago

Cool!

b-mehta commented 6 years ago

I've added Pollard's rho but on benchmarking, it doesn't have the speedup I expected. Specifically, for moduli below around 10^8 it is slower than the baby-step giant-step method, but works faster for large moduli. For cases near 10^6, the BSGS method took 27.27ms while Pollard's rho took 43.15ms while for cases near 10^10, BSGS took 8.61s and Pollard's rho took 4.06s.

While it's not surprising that Pollard's rho is eventually faster, I'm surprised the cutoff is so high. Is there some obvious deficiency in my rho algorithm that makes it perform badly on small inputs? @Bodigrim

(Note the rho implementation isn't perfect right now - it currently only works probabilistically but the probability of error is so small that it succeeds on all the test cases and benchmark cases. The functions starter and begin are part of the fix for this, but right now they appear redundant, and when failure occurs, the error will be from calling head on an empty list in line 79. I'll fix in later commits.)

Bodigrim commented 6 years ago

There are two sources of inefficiency: redundant divisions in step and allocating lazy triples on heap.

Here is my attempt:

data Pollard = Pollard !Integer !Integer !Integer

discreteLogarithmPrime :: Integer -> Integer -> Integer -> Natural
discreteLogarithmPrime p a b = fromInteger $ head $ filter check $ begin (starter 0 0)
  where
    n = p-1 -- order of the cyclic group

    halfN = n `quot` 2
    mul2 m = if m < halfN then m `shiftL` 1 else m `shiftL` 1 - n

    step (Pollard xi ai bi) = case xi `rem` 3 of
                          0 -> Pollard (xi*xi `rem` p) (mul2 ai) (mul2 bi)
                          1 -> Pollard (a*xi `rem` p)  (ai+1) bi
                          _ -> Pollard (b*xi `rem` p)  ai (bi+1)
    starter x y = Pollard (powMod a x n * powMod b y n `rem` n) x y
    begin t = go (step t) (step (step t))

    go :: Pollard -> Pollard -> [Integer]
    go tort@(Pollard xi ai bi) hare@(Pollard x2i a2i b2i)
      | xi == x2i = solveLinear' n (bi - b2i) (ai - a2i)
      | otherwise = go (step tort) (step (step hare))

    check t = powMod a t p == b

One can check by dumping GHC Core {-# OPTIONS_GHC -ddump-simpl -dsuppress-all -dno-suppress-type-signatures #-} that no Pollard thunk is actually allocated and all computations have reduced to naked Integer. Your version of discreteLogarithmPrime tended to allocate and deallocate (_, _, _) constructors here and there.

Before:

Discrete logarithm/range/100 cases near 10000 mean 15.01 ms  ( +- 2.861 ms  )
Discrete logarithm/range/100 cases near 1000000 mean 124.5 ms  ( +- 26.73 ms  )
Discrete logarithm/range/100 cases near 100000000 mean 976.0 ms  ( +- 87.84 ms  )

Now:

Discrete logarithm/range/100 cases near 10000 mean 9.049 ms  ( +- 1.375 ms  )
Discrete logarithm/range/100 cases near 1000000 mean 78.45 ms  ( +- 3.424 ms  )
Discrete logarithm/range/100 cases near 100000000 mean 746.1 ms  ( +- 33.00 ms  )
b-mehta commented 6 years ago

I'm not convinced that allocation of tuples is an issue here, since the Core output shows that unboxed tuples are being used, which has no heap overhead, but I'll investigate further. In fact, your use of the Pollard type also becomes unboxed tuples in Core, and the code for step appears identical in both versions.

Your code doesn't succeed when I test it - there are many missing rem calls which cause it to fail. I suspect the improvement in the range benchmarks is due to these calls being omitted when they are truly necessary.

Bodigrim commented 6 years ago

Sorry, it is my fault; maybe wrong compilation parameters. I can confirm: your code does not allocate triples as well.

Your code doesn't succeed when I test it - there are many missing rem calls which cause it to fail.

What do you mean? It seems passing tests in my environment.

b-mehta commented 6 years ago

What do you mean? It seems passing tests in my environment.

Hm this is curious, the first four benchmarks all fail with head of an empty list with your code, but succeed with mine.

Bodigrim commented 6 years ago

Weird. This is what I see:

Discrete logarithm/individual case/5ⁿ == 8 mod 10^9 + 7 mean 42.95 ms  ( +- 1.763 ms  )
Discrete logarithm/individual case/2ⁿ == 7 mod 3^500 mean 231.4 μs  ( +- 25.78 μs  )
Discrete logarithm/individual case/2ⁿ == 3 mod 10^11 + 3 mean 176.6 ms  ( +- 7.414 ms  )
Discrete logarithm/individual case/3ⁿ == 17 mod 5^350 mean 288.4 μs  ( +- 35.09 μs  )
b-mehta commented 6 years ago

Very strange, on running stack bench --ba '--match prefix Disc -q' I get this error:

arithmoi-0.7.0.0: benchmarks
Running 1 benchmarks...
Benchmark criterion: RUNNING...
criterion: Prelude.head: empty list
benchmarking Discrete logarithm/individual case/5ⁿ == 8 mod 10^9 + 7 ... Benchmark criterion: ERROR

--  While building custom Setup.hs for package arithmoi-0.7.0.0 using:
      /Users/bmehta/.stack/setup-exe-cache/x86_64-osx/Cabal-simple_mPHDZzAJ_2.0.1.0_ghc-8.2.2 --builddir=.stack-work/dist/x86_64-osx/Cabal-2.0.1.0 bench criterion "--benchmark-options=--match prefix Disc -q"
    Process exited with code: ExitFailure 1
Bodigrim commented 6 years ago

I pushed my changes onto CI: it fails with GHC 8.2 and earlier, but succeedes with GHC 8.4 and newer. Very odd. Unfortunately, I am leaving for vacation and won't be able to investigate until the end of next week.

b-mehta commented 6 years ago

No worries - thanks for your help regardless.

Bodigrim commented 6 years ago

Got it. Prior to GHC 8.4 both powModInteger and recipModInteger were broken for a negative first argument (https://ghc.haskell.org/trac/ghc/ticket/14085).

That is why after my modification solveLinear' started to return weird results in GHC 8.2 and led to an empty filter check $ begin (starter 0 0). I have pushed fixes for solveLinear' into master and now my implementation works fine for all supported range of GHCs (https://travis-ci.org/Bodigrim/arithmoi/builds/423235835). 20-30% improvement in benchmarks is observed, same to reported above. I suspect it is purely due to changes in step function.

Could you please rebase your branch upon the latest master?

b-mehta commented 6 years ago

It passes tests but I'm not fully convinced of correctness - you've removed the rem after ai+1 and the like, but the remainder still should be taken after these additions, since they are likely to get larger than the modulus. In addition, the expression for mul2 assumes that the input m is reduced modulo n, but this is not guaranteed since the rem is no longer taken after ai+1.

On second thoughts, these shouldn't be an issue since the residue mod n remains correct. It is curious however that allowing the values to grow larger instead of restricting them to remain below n gives a faster function.

b-mehta commented 6 years ago

I've changed some things, removing some of the rem calls. When I benchmark, using m * 2 instead of shiftL is noticeably faster, so I've kept it. Since Pollard reduces to identical code as the tuples, I've left that as is also.

The calls to mod in line 92 may not be strictly necessary - I originally kept them because solveLinear' was not exported and so I wanted to make sure the input was as expected. Also, it is likely that the mod call will reduce arguments into a smaller range, so the function may run faster, but I'm happy to remove them if necessary.

Bodigrim commented 6 years ago

solveLinear' should work fine for unreduced arguments. Even without reduction by modulo bi-b2i and ai-b2i are O(n) with probability close to 1, so there should not be any performance hit. I suggest to remove mod from line 92 for the sake of clarity.

Bodigrim commented 6 years ago

Awesome. I greatly appreciate your work.

Does it make sense to split this PR into two? The first one about discrete log (which is ready to merge, up to my understanding) and the second (future) one about Dirichlet characters. What is your opinion?

b-mehta commented 6 years ago

I think this is a good idea - in fact I'd like to split it into three PRs: The currently implemented work on discrete logs (which is ready to merge), possible speed ups on discrete logs (including Pollig-Hellman, which I'm investigating at the minute) and Dirichlet characters.

Something worth pointing out is that the algorithm is probabilistic, as I mentioned in the most recent commit, and so has a (negligible) probability of failing. Currently three starting inputs are used, which cubes the failure probability making it even more negligible, but still non-zero. Possible fixes are to allow all starting inputs on failure, so that the function is guaranteed not to fail, or just to use a small amount and hope for the best (cf line 87 in the last commit). I'm testing how likely failure actually is in my branch here - so far for primes less than 7000, all possible instances of pollard's rho succeed by a super naive brute force search What is your opinion on this?

(Unrelatedly, hope you enjoyed your vacation!)

Bodigrim commented 6 years ago

It would be nice to find via brute force at least one example, when a) implementation with only one seed fails to find an answer, b) implementation with two seeds fails, c) current implementation with three seeds fails. And add them to tests.

Can we resort to BSGS, when Pollard rho has failed with three seeds?

b-mehta commented 6 years ago

That would be nice, but I fear that the probability of failure is genuinely tiny - testing all primes less than 7000 represents over 5 billion instances of the discrete log problem, and no failure has yet occured.

Possibly, but BSGS uses IntMaps internally and right now BSGS is used anyway when the inputs are less than 10^8 so in that scenario, the values may not fit within an Int. In addition, if Maps are used instead, it is possible that massive maps are made and create memory issues instead. I'd be more comfortable allowing more seeds instead - since the performance hit of this will almost certainly be smaller than the performance hit of BSGS.

Bodigrim commented 6 years ago

I may be wrong, but my impression is that your implementation of Pollard is not probabilistic. AFAIU we say that Pollard algorithm failed for a given seed when the algorithm found a collision with bi == b2i, for which solveLinear' returns a full, unrestricted range. But in our case we just resort to brute force check-ing.

b-mehta commented 6 years ago

If bi == b2i, then solveLinear' attempts to solve 0x + a = 0, and returns the empty list, so we immediately fail.

Edit: My mistake, we additionally have that ai == a2i, so brute force checking does occur.

Bodigrim commented 6 years ago

Yep. For small arguments it happens quite often, which can be traced as follows:

    go tort@(xi,ai,bi) hare@(x2i,a2i,b2i)
      | xi == x2i
      , (bi - b2i) `mod` n == 0
      , traceShow (p, a, b) False
      = undefined
      | xi == x2i = solveLinear' n (bi - b2i) (ai - a2i)
      | otherwise = go (step tort) (step (step hare))
b-mehta commented 6 years ago

Right, I was doing exactly this. That approach sounds reasonable. Thanks for this catch!

b-mehta commented 6 years ago

This should be good to go. Note that Math.NumberTheory.Moduli.Equations exports solveLinear' - hopefully this is acceptable @Bodigrim.

Bodigrim commented 6 years ago

Note that Math.NumberTheory.Moduli.Equations exports solveLinear' - hopefully this is acceptable

I've thought about it and decided to revert this new export in 473a5ef25eadc4a0d033a691e4e54d24dae5d5fe. An additional public function needs documentation, tests and is a commitment to support its interface. Being public, solveLinear' should deal somehow with negative modulo and zero modulo, it may also be reasonable to make it working for any Integral a instead of Integer only. That said, from the maintainer's viewpoint it is preferable to use solveLinear only.

b-mehta commented 6 years ago

Fair enough. Is an acceptable compromise to assert that the modulo must be positive, and that the precondition is not checked, similar to integerSquareRoot' from M.NT.Powers.Squares? That way users can still use solveLinear' for Integers without having to wrap values up using someNatVal.

Bodigrim commented 6 years ago

Unchecked preconditions are the root of all evil :) They are the worst plague of arithmoi API in its current state.

Currently I am tying up loose ends for an upcoming release; I'll be glad to return to solveLinear' question afterwards. Please raise an issue to facilitate discussion.

b-mehta commented 6 years ago

Unchecked preconditions are the root of all evil :) They are the worst plague of arithmoi API in its current state.

I see! Fair enough - I have no real complaints about the state of solveLinear' as it is.