Co-dfns / mystika

High-end Cryptographic Library
GNU Affero General Public License v3.0
43 stars 26 forks source link

Big Number Multiplication #39

Open arcfide opened 7 years ago

Tikhon03 commented 7 years ago

We have had this since shortly after finishing SHA-2. I have made improvements on it. One improvement is that it is now possible to do multiplication with multiple columns of an array, thus allowing multiple big numbers to be multiplied without using each (on a nested vector) or the rank operator (on multiple rows).

One pending improvement is to adjust it to use only partial carrying. Carrying is implemented to be in uniform time, which is essential to avoid timing attacks, but that necessitates it being sequential or recursive. The fact is, so long as place value is maintained you don't need complete carrying unless you are either truncating places (carrying must be applied completely to the truncated places so no information is lost) or if you are asking for the final value.

Yes, I realize I just implied that the code no longer uses bit vectors. Currently it is in base B=2^8, but B is stored as a constant and can be changed on a whim. What can I say...hardware addition is actually very fast! I tried and failed to beat it with bit vectors. Even Xtimes (XT) itself already uses hardware floating point arithmetic. So, I decided to just stop fighting it. The base B is chosen to be the maximal value allowed without XT loosing precision on the floating point numbers. Since an upper bound on the Fourier Transform (FT) is given by the sum of the array elements, this means that the current implementation of FT produces correct answers with byte vectors up to length 2^24, and XT works for multiplying two byte vectors of half that size, so in base B=256 this enables multiplication up to B^(2^12)=2^(8*2^12)=2^(2^15) i.e. 32,768 bits. Decreasing the value of B does increase the maximum size, and peaks at B=4 (not B=2). Unfortunately decreasing the value of B also decreases the speed of addition, so there is a trade-off here. Since I am content with the maximum size available in base 256, that is why B is currently set to this value, but all of the code I have on github should work for B=2^(2^k) where k=0,1,2,3,4 or 5, without changes unless otherwise stated by a comment.

So, anyway, in many cases full carrying can be avoided until just the end of a calculation or just before truncation, and since partial carrying can be done in parallel we will have a substantial speed increase by doing full carrying only so far as we need it. I have made some progress on this, but the correctness of some of my algorithms still needs to be checked with the partial carrying, and for that it would be nice to have the new UT structure.