math/big: extend ProbablyPrime with Baillie-PSW test

jfcg commented 8 years ago

BPSW is the current popular way for probabilistic primality testing.

More detailed info about BPSW is at https://en.m.wikipedia.org/wiki/Baillie–PSW_primality_test

A patch and initial discussion is at https://groups.google.com/forum/m/#!topic/golang-dev/AOAbwvCwgwo

minux commented 8 years ago

I don't like the idea of use negative n in ProbablyPrime(n) to test for primality using Baillie-PSW. That will limit us to Miller-Rabin and Baillie-PSW tests forever. What if there are even better tests in the future? (Say if a future version of AKS primality test become practical or a probability version of deterministic AKS primality test.)

If we want to add it to math/big, we should choose another name.

But then what should the user choose? There is no proof that Baillie-PSW doesn't have false positives, but there is clear and well-understood bound for Miller-Rabin test giving false positives.

Perhaps we can add it to a x/math subrepo and see how many people actually cares about the speed of primality testing and also willing to take the risk of using algorithm with unproven result.

jfcg commented 8 years ago

We can use BPSW for n=0 and reserve n<0 for any future improvement.

I agree that there should not be multiple apis for primality testing. In function doc we clearly inform users of what we use for each n and what the success guarantees are. For a composite, failure rate of a single MR is 1/4. For BPSW:

BPSW is available since 1980, people have tried every available prime & composite on the planet and it stood the test of time. It has a sound well-studied algorithm, notice it is an extension of MR. No serious library/language/software uses only MR for testing. Everybody suplements MR with a Lucas(jacobi=-1). They "complement" each other. Blindly rejecting this without reading the relevant literature is a deep ignorance & shame for the Go project.

This is probabilistic primality testing, as the name suggests. We dont claim to have a primality proof in the first place. This does not mean we can provide lousy algorithms to users. On the contrary BPSW is the state-of-the-art primality testing.

cznic commented 8 years ago

We can use BPSW for n=0 and reserve n<0 for any future improvement.

I agree with @minux that the existing function should be kept alone.

BPSW is available since 1980, people have tried every available prime & composite on the planet and it stood the test of time.

This is mathematically irrelevant. The lack of proof is what matters. I object to putting an unproven, possibly failing primality-checking algorithm into the stdlib.

jfcg commented 8 years ago

Have you read "The danger of relying only on Fermat tests" at https://en.m.wikipedia.org/wiki/Baillie–PSW_primality_test ??

ALTree commented 8 years ago

is a deep ignorance & shame for the Go project

Please be civil. There's no shame in wanting to understand advantages and disadvantages of a proposal before implementing it. You failed to provide some information.

without reading the relevant literature

you did not provide "the relevant literature" (wikipedia is not literature). I understand that you may expect a reviewer to go on wikipedia and then doing a literature search, but this is really something that you should help with as part of your proposal.

I have a few comments. First of all:

[Java's isPrime for bigInts] does one or more Miller-Rabin tests with random bases. If n, the number being tested, has 100 bits or more, this method also does a non-strong Lucas test

So what they do is MR for bits < 100 and MR + Lucas on bits >= 100. But this is not what you're doing in your patch. First of all there's no 100 threshold: you make the caller choose what the method does - i.e. n > 0 means MR and n <= 0 means calling sprp() && slprp() (as a side note, the function names you have choosen really do not help understanding your code). This is confusing. Java's behaviour is nice: the method configures itself. This would not be the case in go (at least not with your proposal).

When should the user call isProbablyPrime with n == 0 ? And when not? You have to explain this (since we'll have to explain it to our users if we are to implement this).

Maple's isPrime function requires no configuration. Mathematica's PrimeQ function requires no configuration.

Also Java uses a non-strong Lucas test. Your patch seems to test for strong-Lucas primality. Care to explain why?

Second:

GNU Multiple Precision Arithmetic Library's mpz_probab_prime_p function uses a Miller-Rabin test, but does not use a Lucas test.

GMPL is is a widely used bignum library and what they do is MR. I'm not saying we should use pure MR just because GMPL does that, but the fact that they do suggests that a pure MR primality test is not an horrible crime. Go big package is much much simpler and smaller than GMPL.

Third: you have been asked several times to explain what guarantees (if any) the test gives regards to the probability of false-positives or false-negatives. The current PM implementation is pretty clear about that (the usual 4^-k bound). What about BPSW? In your patch you only write

returns true only if x is a probable prime

maybe it's me, but this mean everything and nothing.

Fourth: performances. Is this faster or slower? You only did a single benchmark with n = 100 on MR, but almost nobody will call MR with parameter 100. n = 20 is a tipical value.

jfcg commented 8 years ago

Alberto, If you really cared, you would have read my patch and see the two relevant literature references as comments. Also, yes I would appreciate if a reviewer god damn reads the .ucking wiki article. I did not expect them to do a literature research on the subject! Just read the comments made to this issue and the original thread. Most commenters really did not care to read the wiki or the patch.

bits<100 is just a made up limit, I havent read or come up with any "bit limit" on when lucas should be used. In any case, real applications will care for primes of 200+ bits. For example EdDsa uses 250+ bits primes.

sprp: strong probable prime (standard term in number theory) slprp: strong lucas probable prime (standard term in number theory) Really easy definitions, you would quickly recognize them if you cared to read my patch, the wiki or the papers ;)

As you know, Go has a compatibility promise. I did not want to clutter the api, just extend the current api with how Sage/Pari/Gp does it. Notice that there is no loss of MR(n) functionality. That's why we need 'configuration'. Users just use whatever they want. n=0 for BPSW, n>0 if they want MR(n).

If you cared to read some basic info about primality testing, you would not ask why we use the strong versions, but i'll explain in any case: strong tests are better detectors of primes/composites. Also the existing MR code is itself a strong probable prime test ;)

Gmp is the only serious lib that does not employ lucas tests, dont know why. Why dont you ask them and let us know.

BPSW is:

either necessary/sufficient (notice the math term;) for primality
or it has an extremely extremely low error rate (MR's is 1/4). Nobody was able to provide a composite that fools this test since 1980! You can contruct composites that fool MR(n=64) for example.

Nobody knows which yet.

You should use BPSW, if not, you should use MR(n=40+) for a serious application. BPSW is faster than the latter case.

Probable prime means:

either prime
very rare composite that fooled your prp test.

That is the mathematically correct term for what all these functions produce.

I hope I covered all your questions, let me know if I missed anything.

ALTree commented 8 years ago

First of all: I did not study the patch carefully because we are not doing code review. Understanding if your proposed extension is fine should not require one to study (or even open) a code patch. And certainly not one with functions named sprp and slprp.

sprp: strong probable prime (standard term in number theory) slprp: strong lucas probable prime (standard term in number theory) Really easy definitions, you would quickly recognize them if you cared to read my patch, the wiki or the papers ;)

This is a non-argument. I quickly glanced to the diff in ProbablyPrime, and what I found is

+ return x.abs.sprp(natTwo, nm1, q, k) && x.abs.slprp(b == 2)

compare with Java SDK implementation

return passesMillerRabin(rounds, random) && passesLucasLehmer();

As you know, Go has a compatibility promise. I did not want to clutter the api, just extend the current api with how Sage/Pari/Gp does it. Notice that there is no loss of MR(n) functionality. That's why we need 'configuration'. Users just use whatever they want. n=0 for BPSW, n>0 if they want MR(n).

Yes, I understand that. And yet everyone who has read your proposal said that the proposed API is ugly. Maybe a new fuction would be better (as Minux said). There's no agreement, not even on the signature of the function, and yet you expect someone to carefully study the whole patch.

If you cared to read some basic info about primality testing, you would not ask why we use the strong versions, but I'll explain in any case: strong tests are better detectors of primes/composites. Also the existing MR code is itself a strong probable prime test ;)

I know the difference between strong and weak tests, thank you. My point was that you can't both write "we should do it this way because this is how everyone is doing it" and avoid being questioned about remarkable differences between your implementation and other languages/libraries/platforms implementations.

I look at Java's implementation and I see weak-Lucas, I look at GMPL and they don't even do Lucas. Mathematica and PARI/GP do exactly what you're doing in your patch. The sage reference returns 404. Perl and Python implement Lucas and BPSW in external libraries.

To sum the evidence up:

Mathematica and PARI/GP do BPSW out of the box for isPrime
Java does MR + weak-Lucas after a certain threshold
GMPL does MR only
Other languages provide Lucas and BPSW in external libraries

You should use BPSW, if not, you should use MR(n=40+) for a serious application. BPSW is faster than the latter case.

How much faster for n=40 ?

Probable prime means: either prime very rare composite that fooled your prp test. That is the mathematically correct term for what all these functions produce.

I was specifically asking if we do have a bound. You write "very rare", I deduce that a bound is not known. That's fine.

IMHO the first thing to do is decide how the API will look if this goes in (new function? n <= 0 trick? Out-of-the-box Lucas in isProbablyPrime?).

rsc commented 8 years ago

The docs are very clear:

// ProbablyPrime performs n Miller-Rabin tests to check whether x is prime.
// If x is prime, it returns true.
// If x is not prime, it returns false with probability at least 1 - ¼ⁿ.
//
// It is not suitable for judging primes that an adversary may have crafted
// to fool this test.

While we could start overloading non-positive values of n, that will be confusing to read and also confusing when you compile code expecting the new meaning of (say) 0 but instead it panics saying 0 is not a valid argument.

There are two possible ways forward for adding other primality test algorithms:

If a test makes sense to do all the time or in some limited context (such as small numbers), just do it, in addition to the currently-guaranteed Miller-Rabin tests.
Add a new method.

There have been a few things suggested here, including BPSW and various Lucas variants. I would like to see, for each test being proposed, a crisp statement containing:

the proposed new test
the class of numbers being tested for which the test makes sense over the current implementation, and
the cost compared to one Miller-Rabin iteration.

rsc commented 8 years ago

@jfcg, you wrote:

Blindly rejecting this without reading the relevant literature is a deep ignorance & shame for the Go project.

to which @ALTree reasonably replied:

Please be civil. There's no shame in wanting to understand advantages and disadvantages of a proposal before implementing it. You failed to provide some information. ... you did not provide "the relevant literature" (wikipedia is not literature). I understand that you may expect a reviewer to go on wikipedia and then doing a literature search, but this is really something that you should help with as part of your proposal.

To which you replied:

If you really cared, you would have read my patch and see the two relevant literature references as comments. Also, yes I would appreciate if a reviewer god damn reads the .ucking wiki article.

This is not really a productive way to have a conversation. You're not doing anything to help your case.

jfcg commented 8 years ago

Hi, I directly time each call x.ProbablyPrime(n) with my patch on core i5 4210m laptop with ubuntu 14.04, 3.19 kernel. For starters, I got these numbers. Separated by type:

// composite fermat/mersenne numbers, strong pseudoprimes to base 2 // 2^256+1, 2^512+1, 2^257-1, 2^509-1 n= 40 257 bits took 207.514µs 513 bits took 584.49µs 257 bits took 69.83µs 509 bits took 207.916µs total: 1.06975ms

n= 0 257 bits took 911.742µs 513 bits took 1.25268ms 257 bits took 295.564µs 509 bits took 837.612µs total: 3.297598ms

// 5 primes with 256 256 697 730 995 bits n= 40 256 bits took 1.745521ms 256 bits took 1.719863ms 697 bits took 18.613694ms 730 bits took 21.783088ms 995 bits took 46.10569ms total: 89.967856ms

n= 0 256 bits took 517.818µs 256 bits took 509.545µs 697 bits took 3.270944ms 730 bits took 4.458831ms 995 bits took 7.606551ms total: 16.363689ms

// pairwise products of 5 primes n= 40 512 bits took 229.017µs 953 bits took 959.979µs 986 bits took 1.142523ms 1251 bits took 2.032602ms 953 bits took 946.264µs 986 bits took 1.110485ms 1250 bits took 2.087854ms 1427 bits took 3.505977ms 1691 bits took 4.834519ms 1725 bits took 4.44336ms total: 21.29258ms

n= 0 512 bits took 496.858µs 953 bits took 1.881229ms 986 bits took 2.041718ms 1251 bits took 4.105593ms 953 bits took 1.907105ms 986 bits took 2.037402ms 1250 bits took 3.454044ms 1427 bits took 5.377545ms 1691 bits took 7.636516ms 1725 bits took 7.461523ms total: 36.399533ms

I was wrong for composites, MR(n=40) works faster for them. I think that sounds right because for a prime, MR needs to try all the 40 bases, for a composite there are chances to get eliminated for each base.

I think MR enjoys a well implemented expNN, lucasMod is bare minimum.

There are constant costs as well. I'll make polynomial fits to these data & come back :)

minux commented 8 years ago

Speed is not the only concern.

The major client for ProbablyPrime is crypto/rsa.

I did a survey of major open source crypto libraries to see how do they test for primes when generating RSA keys:

OpenSSL: https://www.openssl.org/docs/manmaster/crypto/BN_generate_prime.html Only uses Miller-Rabin probabilistic primality test, with bounded false positive rate of at most 2^-80 for random input.

NSS: function mpp_make_prime in http://hg.mozilla.org/projects/nss/file/tip/lib/freebl/mpi/mpprime.c#l392 uses Fermat test followed by Miller-Rabin tests.

CryptoPP: RSA key generation uses the VerifyPrime function at https://github.com/weidai11/cryptopp/blob/master/nbtheory.cpp#L249 and VerifyPrime uses strong lucas test followed by Miller-Rabin tests.

It's interesting to see that:

all of them uses MR test in the last step (ranging from 2 rounds to 40 rounds, our crypto/rand.Prime uses 20 rounds)
none of them uses Baillie-PSW,
only CryptoPP uses strong lucas test.

rsc commented 8 years ago

Let's take cryptography explicitly off the table. This is about math/big.ProbablyPrime, not crypto/rand.Prime. There are many other concerns for the generation of large random primes in a cryptographic context, and in fact in that case only a few rounds of Miller-Rabin are required. See Menezes et al., Handbook of Applied Cryptography, 1997, pp. 145-149. If you can't find that, see Appendix F of FIPS 186-4, which has some of the same information.

rsc commented 8 years ago

@jfcg, we're not going to put BPSW in just for performance reasons. In my list above, by "the class of numbers being tested for which the test makes sense over the current implementation" I meant a non-performance (that is, correctness) rationale.

jfcg commented 8 years ago

So far with my limited computational capabilities, I only found 206981=263*787 which passes MR(n=2) but fails MR(n=3) & BPSW. It is a strong pseudoprime to 34320 bases.

jfcg commented 8 years ago

just got: 556421=431 * 1291 passes MR(n=2), strong pseudoprime to 92448 bases 587861=443 * 1327 passes MR(n=3), strong pseudoprime to 97680 bases

danaj commented 8 years ago

@jfcg, below 2^64, there are almost 32 million base-2 strong pseudoprimes. Of those, 1.5 million are also base-3 strong pseudoprimes. Of those, 131,157 are also base-5 strong pseudoprimes. 16,826 will pass M-R with bases 2,3,5 and 7. No number below 2^64 will pass a properly written BPSW test. Hence it is deterministic below that size. [edit: see the Galway/Feitsma database, also see Pseudoprime Statistics, Tables, and Data]

Strong vs. standard Lucas test: the 1980 paper rightly points out that there is no reason not to use the strong test. The pseudoprimes are not only a subset but the performance is the same or better. Why Java chose to ignore that is beyond me, but a clue may be gained by their calling it a "Lucas-Lehmer" test, which is a completely different test. This is not written by someone familiar with the literature.

Re performance, on primes, a good implementation of BPSW should take 2.5 to 3.5 times the time of a single M-R test. A practical primality test will include some trial division to very quickly weed out composites. For under 2^64 this is probably just to 20-100, but for larger numbers it is better on average to look much further. I use primes to 53 for 32-bit and to 89 for 64-bit, but that's a particular C implementation. The first M-R test will cut out most of the remaining composites. Using base-2 strong pseudoprimes as the test set is of course the worst case for BPSW as you guarantee it must run the Lucas test (which is ~1.5 - 2.5 times the cost of an M-R test). Anyway, I cannot discuss this implementation, but BPSW ought to be much faster than M-R with 40 bases.

For more on performance, under 64-bit the fastest solutions (that I am aware of) involve tuning small numbers, tuning the trial division pre-tests, then using hashed Miller-Rabin for 32-bit, and either hashed-Miller-Rabin or BPSW above that. For native integers they also use M-R and Lucas tests using Montgomery math and snippets of assembly, but I assume that is far beyond the scope of this.

I have a BPSW implementation for Perl6 using MoarVM. It is 4-20x faster than the current one for large numbers (10^50 - 10^8000). It also removes some known counterexamples (they don't use random bases). This is not directly comparable (the code in question uses C+libtommath and lives in the VM internals), but gives some idea.

Most of the literature is linked on the Wikipedia page. The paper by Baillie and Wagstaff (October 1980) is IMO the definitive paper (the July 1980 paper cites it, but got published first). Nicely's site is nice but has a few misleading comments (e.g. the trial division step is not part of BPSW, and his comments on the extra-strong Lucas test are not right). I would also recommend Pinch 1993 which is one reason all the math programs switched to BPSW in the 90s. Looking at Nicely's GMP pseudoprimes page is also interesting.

The main problem I've had trying to discuss BPSW with people who don't know the field is the "what is the exact error bound!" question. The second is a lack of understanding of the difference between fixed and random bases, but that's not an issue here. For error bounds, the best we can do with BPSW as a guarantee is 1/4 * 4/15 (M-R 2 plus strong Lucas) or 1/4 * 1/8 (M-R 2 plus ES Lucas). It should be obvious, given that we know there are no counterexamples below 2^64 and haven't seen one above, that this error bound is ludicrously conservative. However there isn't a strong statement we can give. Strong pseudoprimes and strong Lucas pseudoprimes are less dense as the input size increases, and the tests are anti-correlated so we would expect them to be hard to find once past the small numbers.

If one is very uptight about correctness, I would suggest separating out these cases:

below 2^64, do whatever is best for performance and gives completely correct results. This is where a lot of users will spend time calling you in loops. BPSW is one way to get correct results, so is deterministic M-R, so is hashed deterministic M-R.
See Sorenson and Webster, Sep 2015 for deterministic M-R to 33170 44064 67988 73859 61981 (over 2^81). You can call it directly or first call BPSW then if that is true and the input is below the threshold, call the 12 extra M-R tests to get a deterministic correct answer).
BPSW plus a few random-base M-R tests. BPSW gives you the result that anything that passed your test would have also passed a number of other programs. The extra random-base M-R tests will give you extra warm fuzzies, not take too long if there are only a few (e.g. 2 to 5), and defeat the nefarious NSA agents who are holding onto the secret (your variant) BPSW counterexamples.

jfcg commented 8 years ago

Hi @danaj, Thanks a lot for your comprehensive comments. I'll read those papers right away.

The most interesting composites I have found so far are: 3673744903, 3281593591, 2385076987, 2738053141, 2009621503, 1502682721, 255866131, 117987841, 587861 which pass the current MR(n=3) test. These all come from a brute search.

random bases currently are generated by a pseudo RNG seeded by n[0], and mostly determined by BitLength(n-3). Chances are powerful attacks can be designed to generate strong pseudoprimes to multiple random bases that will be sequentially generated by the current generation process, hence pass the test. My crude guess :)

For sprp test I have an idea: Testing n for base a, n-1 = q * 2^k with odd q, we check a^q = 1 or -1 (mod n) or a^(q * 2^i) = -1 (mod n), for some 0< i <k

if I choose a with jacobi(a,n)=-1, since q is odd, a^q is not a square mod n therefore we cant have a^q = 1 (mod n), therefore we tighten the sprp test. Let's check: n, # of a, # of a with (a/n)=-1 3481 56 0 3721 58 0 4087 16 8 4331 48 24 4453 52 36 4489 64 0 4819 16 8 4891 16 8 5041 68 0 5293 16 0 5329 70 0 5429 4 4 5767 16 8 5917 52 36 5963 240 120 6161 148 0 6241 76 0 6283 16 8 6497 20 0 6499 16 8 6649 52 0 6889 80 0 6893 4 4 6901 16 0 7081 196 144 7171 48 24 7303 16 8 7373 4 4 7519 16 8 7663 16 8 7747 16 8 7921 86 0 7957 484 324 7991 48 24 8023 96 48 8137 16 0 8249 20 16 8357 4 4 8479 16 8 8509 16 0 8611 16 8 8633 20 16 8989 4 4 9017 96 0 9089 4 0 9211 448 224 9271 160 80 9301 48 0 9313 16 0 9409 94 0 9577 52 0 9701 4 4 9797 4 4 9943 16 8 9991 16 8

As you see, choosing the base with (a/n)=-1 is a good idea. I think it is better than choosing constant or random bases. I can even utilize jacobi calculation for both sprp & slprp tests. What do you guys think ?

danaj commented 8 years ago

Re using PRNG bases is basically what GMP does. The link I gave to Nicely's GMP pseudoprimes page discusses it and shows a number of cases where we can make it fail with what seems like a reasonable number of tests. It's deterministic so one can search for numbers that pass many bases.

The idea of choosing parameters where (D|n)=-1 is discussed in Baillie/Wagstaff 1980. It's used in most Frobenius style tests, as it gives far better results.

I will reiterate that below 2^64, this should be a different discussion centered around the best performance vs. effort / maintainability. There is no good reason to be doing a probable prime test, as there are well known methods for fast deterministic correct results.

Above 64-bit, there are a variety of methods with different tradeoffs. My take (which is certainly debatable) is that BPSW (M-R base 2 followed by a strong, AES, or ES Lucas test) is the best starting point. There are decent arguments for that being adequate by itself, but I can see why some might want additional tests, whether they be a Frobenius variant, additional fixed M-R (such as Mathematica adding a base 3 test), additional deterministic M-R, or additional non-deterministic (however far down the rathole one wants to chase that) M-R. There will always be a small group that desires a proof for any input, but that has serious downsides for everyone else.

It's not a horrible sin to use just M-R, but it's a little disappointing. GAP, Maple, Mathematica, Maxima, PARI/GP, and Sage all use some variant of BPSW, and as mathematics programs that have been around for a while they obviously care about correctness. At a computational cost of ~3 M-R tests it is also quite efficient.

jfcg commented 8 years ago

Yes, classical BPSW = MR(base=2) && Lucas( (D/n)=-1 )

For the MR step, choosing (base/n)=-1 is a tighter test, so I want to use D of the Lucas test as the MR base, that is MR(base=D) && Lucas( (D/n)=-1 ) Reusing the result of the jacobi calculation. What do you think ? Notice that discriminant of the sequence a^(n-1)-1 is always (a-1)^2, so MR always has (discriminant / n)=1, and stays "independent" with Lucas ((D/n)=-1)

jfcg commented 8 years ago

I am thinking of keeping the initial deterministic check up to 2^64. What do you think ?

What is the list of strong pseudoprimes to bases 2,3,5,7,11,13,17,19,23 that are less than 2^64 ? 1st one is 3825123056546413051 (~2^62), are there others ?

danaj commented 8 years ago

Re using a different base for BPSW, the biggest issue is that we have a complete list of all base-2 strong pseudoprimes to 2^64. That lets us quickly check actual results for different solutions. Without base 2, we're limited to ~2^48 given a few months of computation. Also, everyone else has been using base 2 -- I'm not sure you want to start deviating. I think you should also use the standard methods for selecting the Lucas parameters -- Selfridge method for strong test, Baillie method if extra strong (Pari/GP does something different, so I suppose that's ok also since it's been used a lot).

3825123056546413051 is the only counterexample for bases 2-23 below 2^64.

There are more efficient bases at the best known SPRP bases page. You have to use some care using the smaller sets since the base will typically be larger than the input (read the remarks). Naive solution -- just use the 7-base solution which is two fewer than {2,3,5,7,11,13,17,19,23} and doesn't need a check for 3825123056546413051. Only 3 bases needed for 32-bit inputs. If you have BPSW, then it ought to be more efficient than the 7-base or your first-9-prime-bases method, unless your users are always testing base-2 pseudoprimes.

ALTree commented 8 years ago

(just a reminder that we're in the freeze phase of the go development cycle, so if you decide to send a CL be aware that it won't be merged for go1.6)

jfcg commented 8 years ago

This is Baillie method, right?

P = 3; Q = 1;
while (jacobi(P*P-4,n) != -1)
    P++;

Method B on p.1401 of Baillie & Wagstaff (1980) requires P to be odd. Why do we allow even P ? Both methods A & B require D = 1 mod 4.

So are the bases 2, 325, 9375, 28178, 450775, 9780504, 1795265022 sufficient for all inputs < 2^64 ?

danaj commented 8 years ago

Yes, the Feitsma/Galway list of base-2 SPSPs can be used to verify there are no counterexamples under 2^64 for those 7 bases.

For the strong test, method A (Selfridge) is typically used. I'm not aware of anyone using method B.

Yes, that's what Baillie called Method C in 2013 on OEIS. Using it in combination with the extra strong test yields the known set of ES Lucas pseudoprimes (which have no overlap with base 2 strong pseudoprimes under 2^64, and are anti-correlated so we expect overlaps to be rare). I like the extra strong Lucas test, as it is computationally faster, has a smaller known error bound, and has a smaller set of pseudoprimes making overlaps less likely. But the strong Lucas test is fine and arguably gets more use.

jfcg commented 8 years ago

Yes we use the ES lucas. Method B "is" what Pari/GP does. Put Q=1 (extra strong lucas), then D=P^2-4 must in the set 5, 9, 13, 17.. so P must be odd & starts from 3. Both methods A & B require D = 1 mod 4 for some reason. Do you know why? Do you have a link to a page explaining why C allows even P ?

danaj commented 8 years ago

PariGP sets Q=1, P = 3,5,7,... That is neither method A or B from Baillie/Wagstaff. It's method C skipping evens.

Method B would choose the first of these to give (D|n)=-1 and gcd(n,2PQD)=1: D=5 P=3 Q=1 D=9 P=5 Q=4 D=13 P=5 Q=3 D=17 P=5 Q=2 D=21 P=5 Q=1 D=25 P=7 Q=6 ...

The parameter selection use by Pari/GP is fairly similar to Baillie's, just using P += 2 so only odd values of P are selected. Other than that, its basically the same (Q=1, D=P^2-4). Pari/GP also doesn't do the full extra strong test.

I don't believe Baillie wrote a paper with method C. The only thing I have to go on is his OEIS entry. I came up with the same idea at about the same time (it's rather obvious), but his name pulls far more weight than mine :).

I think good methods are:

strong Lucas plus method A (Selfridge)
almost extra strong with method C (or the odds-only-P variant -- so this test is the Pari/GP varant)
extra strong with method C (odds-only-P if you'd like)

jfcg commented 8 years ago

Yes we are saying the same thing :D Method B constrained to Q=1 (for appliying the ES lucas) is what Pari does. C also allows even P.

almost variant just skips U=0 check right ? so it is easier to calculate.

jfcg commented 8 years ago

Hi all, Since we have time till go1.7 freeze :P I want to explore the MR( (base/n)=-1 ) idea.

List of the first 419489 strong pseudoprimes to base 2: http://ntheory.org/data/spsps.txt

On these numbers I ran this test:

// try to find odd P with (P*P-4 / x)=-1
j,P := jacobiP(x)

// sprp test with bases P*P-4 if j=-1, 3, 5, 6, 7, 15
// aeslprp test with P

12248 of them also pass base P*P-4 19322 of them also pass base 3 29326 of them also pass base 5 27020 of them also pass base 6 17350 of them also pass base 7 14623 of them also pass base 15 2804 of them got j=0 2 of them got j=1, 1093^2 and 3511^2 none of them passed lucas.

What do you think ? Corrected minor counting error. Other bases are applicable even if j != -1

jfcg commented 8 years ago

Some more stats base: # of strong pseudoprimes < 2^25 P*P-4 if j=-1: 210 2: 296 3: 361 5: 315 6: 327 7: 283 15: 245

danaj commented 8 years ago

I'm not sure this is the right venue for the discussion, but I'm not sure what is. mersenneforum would have knowledgeable people interested in the subject, but the wrong wording can lead to non-constructive input.

(1) doing anything other than base 2 means you're deviating from the method that's been tested for 35 years. It's still a really interesting question though.

(2) For 64-bit input, using any other base is pointless. We already know 2+lucas has no counterexamples, and using another base has no speed advantage. Worse, using another base means we have no verification that there aren't counterexamples. There are a metric crapton of 64-bit composites, so testing any other base to that depth is unthinkable with current computing resources.

(3) More than just the number of pseudoprimes, the residue classes are important. This is what gives the test such strength -- numbers that pass MR(2) fall in classes that rarely pass the Lucas test, and vice-versa. I suspect your (base | n)=-1 idea will have the same result, but it should be checked.

I get the same numbers you do for <2^25. At 2^26 it is 272 for P_P-4, 409 for base 2. At 2^27 it is 360 for P_P-4, 552 for base 2. So about 2/3 the number of pseudoprimes. The intersection of base 2 and base P*P-4 below 2^64 is also 65% vs. base 2 and base 3.

Addendum: Vastly smaller than 2^64, but still.. I checked the Lucas and ES Lucas pseudoprimes below 10^14, and as expected, no overlap with the base P*P-4 strong pseudoprimes.

Depending on your performance/correctness tradeoff, you may want to consider instead leaving the base alone (so any counterexample must be a strong BPSW counterexample) and add an additional test:

Frobenius on top of the Lucas test (see Crandall and Pomerance), cost 1 M-R.
Frobenius-Underwood, cost ~2.5 M-R, basically an efficient way to do a Fermat+Lucas test with differing bases from BPSW.
Frobenius-Khashin, cost ~3 M-R (I think). Simple, not much used.
One more M-R base using your method, giving a deterministic test that's at least as strong as Mathematica (quite likely stronger), while only costing one extra M-R. The total computational cost should be about the same as 4 M-R tests, and if you do both M-R tests first, you're only running the Lucas test on primes and a quite small set of composites.
Additional M-R tests. However you feel makes you happy while keeping performance high. This is basically what FIPS 186-4 discusses for crypto: running a Lucas test is worth a lot of additional M-R tests. But this isn't crypto (which brings in additional requirements as well as relaxations).

jfcg commented 8 years ago

Yes, I think you are right because:

out of 2^27 numbers, 360 or 552 pseudoprimes dont matter much. our typical input will be a composite that is not a pseudoprime. this ratio (~360 / 2^27) becomes smaller as numbers get larger, making the difference less significant.
most inputs will be eliminated by the MR step. with p*p-4 base, we would add a jacobi calculation cost to "mean run time". for a constant base, jacobi is needed only by the 2nd step (lucas).
as you said, we already have a result of independence for MR(base=2) & Lucas on inputs < 2^64

i think my patch is ready (i fixed a couple more points & added inList()), passing the tests as well. the only remaining point is the signature. if using BPSW when n=0 is ok, i can submit the patch. if not, what do we do guys ;?

note: for now, i think bpsw guarantee for inputs < 2^64 is sufficient, i dont want to wrestle with the 7-base MR for small inputs :)

rsc commented 8 years ago

i think my patch is ready, passing the tests as well. the only remaining point is the signature. if using BPSW when n=0 is ok, i can submit the patch. if not, what do we do guys ;?

This issue is a proposal (see golang.org/s/proposal). It's perfectly fine to continue to have detailed discussion here, like you've been doing, but the way we move forward is that the discussion converges on a clear statement of what it is you are proposing. See my earlier comment for the specific details that should be in the proposal.

Note that you don't have to write a design doc for this, but we do need that clear statement.

jfcg commented 8 years ago

The offical proposal of the new test: We use this specific variant of the Baillie-PSW primality test: BPSW := MR(base=2) && AESLucas( (D/input) = -1 )

AES stands for almost (skips the U sequence, works only with V) extra strong (as told in Grantham theorem 2.3). This is exactly what Sage / PariGp / Mathematica does. Java uses a weaker variant of Lucas test.

We search discriminant D=P^2-4 of the V sequence (odd P, constrained to Q=1) with Method B of Baillie & Wagstaff. This can be relaxed with also allowing even P. We use odd P as this is what PariGp/Sage does, and Baillie & Wagstaff chose to have D=1 mod 4 for both methods A & B, which requires odd P. After calculating jacobi (D/input), we do handle 0 & 1 as well.

MR sequence a^(k-1)-1 has (discriminant/input)=1, and we complement this with a lucas test with (discriminant/input)=-1. This, however unproven, is strongly believed to give this test its terrific capability of detecting composite numbers and there is a big amount of academic work examinig BPSW. It is the current state-of-the-art in primality testing.

BPSW is:

either necassary & sufficient for primality
or it has an extremely very low error rate (a single MR has error rate less than 1/4). there is no known counter example since 1980.

Class of numbers: This specific combination of MR(base=2) and various lucas tests is known to give a definite result for inputs < 2^64. There wont be a big difference between BPSW & MR(n>0) for small inputs. For bigger inputs:

If the user strictly needs a "proven" bound on error rate and inputs are known to be random then existing MR(n>0) can be used.
The user should use BPSW otherwise because MR(n>0) is not suitable for testing numbers in for example cryptographic applications where the input may have been "crafted".

The cost: MR(n=40) arguably provides cryptographic level confidence and can be used if the inputs are known to be totally random. We compare BPSW with it. These are the total times (details):

// 5 primes with 256 256 697 730 995 bits MR(n= 40) total: 90ms BPSW total: 16ms

// 10 pairwise products of 5 primes MR(n= 40) total: 21ms BPSW total: 36ms

Below we time:

per-round cost of MR (random base generation+single test)
JL: jacobi calculation+lucas test

// 256 bits prime MR 55.716µs BPSW 413.391µs

// 256 bits prime MR 51.933µs BPSW 395.738µs

// 697 bits prime MR 468.014µs BPSW 2.874695ms

// 512 bits composite MR 227.235µs BPSW 1.23973ms

// 953 bits composite MR 944.782µs BPSW 4.432682ms

// 986 bits composite MR 1.295064ms BPSW 5.977547ms

For primes: JL ~ 6 to 7.5 times MR-round For composites: JL ~ 4.5 to 5.4 times MR-round

ALTree commented 8 years ago

What about performances on a sequence of e.g. 1000 random big (like 1024 bits) -uniformly chosen- odd numbers? Primality tests are often used like that (for example when you need a big random prime). Take MR(n=40).

jfcg commented 8 years ago

On windows 7 64 bit with core i5-2400, 1000 uniformly random odd big ints, same inputs for both cases:

// 501 to 512 bits, 5 of them are primes MR(n=40) took 112 ms BPSW took 157 ms

// 1013 to 1024 bits, 4 of them are primes MR(n=40) took 550 ms BPSW took 641 ms

All results identical. Times include random number generation.

danaj commented 8 years ago

Do you have a github branch? The Lucas test ought to be faster than that, but of course language details have an impact. Even with the 4-8x speed, I'm wondering why BPSW comes out slower for 512-bit or 1024-bit random inputs. For almost all composites this size they should be identical -- the pretest and first M-R will eliminate them. Which leaves the primes, and even if the Lucas test costs 8 M-R tests, this should come out 5 times faster.

For my program, in a different language and different library, 1024-bit random inputs are ~2x faster with BPSW vs. MR(40) (including time to generate random inputs).

jfcg commented 8 years ago

on my ubuntu laptop, i got:

// 501 to 512 bits, 5 of them are primes MR(n=40) took 95 ms BPSW took 145 ms

// 1013 to 1024 bits, 4 of them are primes MR(n=40) took 448 ms BPSW took 570 ms

Of the 1000 odd numbers ~272 of them pass basic tests. MR(n=40) takes ~ 5 * 40+267 = 467 mr-rounds BPSW takes 272 base2-rounds + 5 * JL ~ 272 base2-rounds + 35 mr-rounds it seems like bpsw should be faster for 512 bits. i am pasting the current state of the patch to golang-dev. let's see if something is wrong.

jfcg commented 8 years ago

I'm told to use code-review, here is the link to the change https://go-review.googlesource.com/16948

jfcg commented 8 years ago

I figured out what's going on: In expNN(x,y,m), montgomery/windowed variants are used when len(x)>1 && len(y)>1 && len(m)>0

MR(base=x) is actually faster than MR(base=2). That's why we are seeing these execution times.

jfcg commented 8 years ago

I deleted len(x)>1 from that line and on windows 7 got:

// 501 to 512 bits, 5 of them are primes MR(n=40) took 111 ms BPSW took 69 ms

// 1013 to 1024 bits, 4 of them are primes MR(n=40) took 552 ms BPSW took 359 ms

jfcg commented 8 years ago

I also noticed a tiny improvement for MR:

If odd N is a strong pseudoprime to base a, then it is also for base N-a. Therefore base a can be chosen from [2..(N-1)/2] instead of [2..N-2]

Let's consider base=(N-1)/2 ( (N-1)/2 )^q = (-1/2)^q = -1/2^q = 1 or -1 mod N which is the same with 1/2^q = 1 or -1 mod N which is equivalent to 2^q = 1 or -1 mod N

Also 1/2^(q * 2^i) = -1 mod N is equivalent to 2^(q * 2^i) = -1 mod N

Therefore base=2 & base=(N-1)/2 are the same checks. So we can choose the base from [2..(N-1)/2).

rsc commented 8 years ago

http://www.trnicely.net/misc/bpsw.html says that a BPSW test should be 3-7x the cost of a single MR round. The results reported so far seem to indicate that BPSW is more like 30x the cost. Why the discrepancy?

jfcg commented 8 years ago

Actually it is. we have: For primes: JL ~ 6 to 7.5 times MR-round For composites: JL ~ 4.5 to 5.4 times MR-round

Also there was a problem with expNN(x,y,m). MR(base=x) (x with multiple Word's) was faster than MR(base=2). Now on my ubuntu laptop, for generating 1000 random odd ints i am getting:

// 501 to 512 bits, 5 of them are primes MR(n=40) took 95 ms BPSW took 60 ms (was 145 ms)

// 1013 to 1024 bits, 4 of them are primes MR(n=40) took 450 ms BPSW took 297 ms (was 570 ms)

BPSW is now unconditionally faster than MR(n=40) in all practical cases (except pseudoprimes-to-base 2 only of course).

jfcg commented 8 years ago

Also we have now:

// 5 primes with 256 256 697 730 995 bits MR(n=40) total 88 ms BPSW total 14 ms

// pairwise products of 5 primes MR(n=40) total 22 ms BPSW total 20 ms

jfcg commented 8 years ago

So what about the signature?

rsc commented 8 years ago

I am asking about comparison to MR(n=1) not MR(n=40). As I read the doc I linked to, BPSW should be about 3-7x the cost of MR(n=1). Using MR(n=40) especially confuses the matter because most of the inputs are not primes. Please compare against MR(n=1). Thanks.

jfcg commented 8 years ago

Well JL & MR-round comparison was for that but just for convenience (on windows 7):

For generating 1000 random odd ints i am getting: // 501 to 512 bits, 5 of them are primes MR(n=1) took 66 ms BPSW took 69 ms

// 1013 to 1024 bits, 4 of them are primes MR(n=1) took 338 ms BPSW took 359 ms

// 5 primes with 256 256 697 730 995 bits MR(n=1) total 3 ms BPSW total 15 ms

// pairwise products of 5 primes MR(n=1) total 25 ms BPSW total 25 ms

Note that n=1 is not suitable for any of these tasks.

danaj commented 8 years ago

Measuring the speed of the methods is often done in a unit of the time taken for a single M-R test (occasionally this unit is called a 'Selfridge', but that's rare). Hence the desire for the cost comparison. I think more fine-grain timing is needed, without the cost to generate the random numbers. Perhaps measure the Lucas test separately and use more of them.

All things being equal, an AES Lucas test costs less than 2 M-Rs, so the whole test is something like 2.5 M-Rs. But all things aren't equal here, with the math library using Montgomery math for modular exponentiation, and the whole M-R path being performance tuned for crypto needs. To get similar performance the Lucas test would have to be done in Montgomery math as well (fortunately there is no need to convert anything back out of Monty form). That's presumably why this is taking closer to 7 M-Rs using the earlier timing data you did 3 days ago for 5 primes:

MR(n=40) took 88ms => 1 MR takes 2.2ms BPSW took 14ms => BPSW costs 6.4 M-Rs

That isn't way off base. Your comparison with single M-R a few hours ago was 5x, but I think more timing resolution is needed.

The big questions seem to be:

Does this belong in (1) math/big/ProbablyPrime, (2) math/big/SomethingElse, (3) an external library.
If (1) above, should it be using n=0? Can we have a no-argument form?
Is BPSW rigorous enough compared to M-R (n=whatever)?

None of these are particular to your implementation. That has a bunch of other questions and comments best left to review.

Is the argument for ProbablyPrime always required? Callers are expected to understand the tradeoffs of different arguments? Assuming there isn't a no-argument form, then (in my opinion) n=0 makes sense. It's the option used by people who don't want to micro-manage the implementation and just want an efficient solid answer. As for the last, BPSW is far better than 7 Miller-Rabin tests. It also gives us deterministic results for all 64-bit inputs (admittedly this can be achieved using set M-R bases). Is it better than 20 tests? IMO yes but we're getting closer, and now the performance tradeoff is getting important.

jfcg commented 8 years ago

yes, windoze timers have much less resolution. The n that gives me (on my ubuntu laptop) equal times is 5 (for generating 1000 random odd ints):

// 501 to 512 bits, 5 of them are primes MR(n=5) took 60.024664 ms BPSW took 60.833635 ms

// 1013 to 1024 bits, 4 of them are primes MR(n=5) took 299.454722 ms BPSW took 298.373894 ms

For 512 bit numbers, 261 of them exactly pass basic tests. So 256+5*5=281 MR-rounds took 60 ms. For BPSW 261 base2 tests + 5 JL took 60 ms. So a JL > 4 MR-rounds. (about ~5-7 MR-rounds. base2 test is slightly faster than base-x now, hence the bigger sign)

For 1024 bit numbers, 251 of them exactly pass basic tests. 247+4*5=267 MR-rounds took 299 ms. For BPSW 251 base2 tests + 4 JL took 299 ms. So again JL > 4 MR-rounds, about ~5-7 MR-rounds.

I think we have done enough timing measurements and have a pretty good idea about what component of each test takes how long. Since Lucas test is executed only when the input is a strong probable prime to base 2, we have a pretty good average run time for BPSW. It is strictly faster than MR-only test when rounds > 5. Any decent application, sure of random input, has to use rounds >= 40 in my opinion for MR-only.

BPSW could become faster if we spend time to mongomerify lucasMod function, but I think the effort is not worth it. We will increase its speed only for strong pseudoprimes-to-base-2 which are very rare.

I think it is time to decide on how to proceed further. What do you think of the signature issue ?

rsc commented 8 years ago

I am not interested in the signature issue until I understand the performance characteristics. Please bear with me.

Can you update the CL you sent out with Go benchmarks (in the sense of package testing's benchmark support) demonstrating the timings (for MR(n=1) and BPSW)? I'd like to run them on a few different machines.

If it's the case that, as suggested above in the preliminary MR(n=1) numbers, that the cost of the BPSW test is roughly the same as n=1, then one possible proposal for the API is to change ProbablyPrime to do what it has always done (n rounds of MR) followed by a BPSW test. Then old code would automatically improve in accuracy at minimal performance cost. But again I'd like to see the numbers more clearly and try them on a few different machines first.

Thanks.

golang / go

math/big: extend ProbablyPrime with Baillie-PSW test #13229