Open edsky opened 1 year ago
Hello,
I'm not a developper of the package, I'm not sure why you need to perform modulus using order but this could be a hint to adapt the code :
Check the 0x1000003D1ULL in the function, it is linked to prime with this operation :
hex(2**256-0x1000003D1) '0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f'
You may try to replace that value with hex(2**256-order), it will be slightly bigger so I don't know if the operations with that value will be longer valid (either on the result, or in the cpu time).
Hello,
I'm not a developper of the package, I'm not sure why you need to perform modulus using order but this could be a hint to adapt the code :
Check the 0x1000003D1ULL in the function, it is linked to prime with this operation :
hex(2**256-0x1000003D1) '0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f'
You may try to replace that value with hex(2**256-order), it will be slightly bigger so I don't know if the operations with that value will be longer valid (either on the result, or in the cpu time).
Hello, I obtained a number 'a' between 2^128 and 2^129 by taking the modulus of 2^256. Following this principle, I divided a large number of 512 into two numbers of 256. Then, after multiplying the high bit by 'a', I got a number of 2^385. Then, I recursively applied this multiplication (until the high bit is zero) to return a number within 256, and the final result is the number obtained by adding this to the low bit number.
__device__ void ModMultOrder(uint64_t *r, uint64_t *a)
{
uint64_t r512[8];
uint64_t t[NBBLOCK];
r512[4] = 0;
r512[5] = 0;
r512[6] = 0;
r512[7] = 0;
// 256*256 multiplier
UMult(r512, a, r[0]);
UMult(t, a, r[1]);
UADDO1(r512[1], t[0]);
UADDC1(r512[2], t[1]);
UADDC1(r512[3], t[2]);
UADDC1(r512[4], t[3]);
UADD1(r512[5], t[4]);
UMult(t, a, r[2]);
UADDO1(r512[2], t[0]);
UADDC1(r512[3], t[1]);
UADDC1(r512[4], t[2]);
UADDC1(r512[5], t[3]);
UADD1(r512[6], t[4]);
UMult(t, a, r[3]);
UADDO1(r512[3], t[0]);
UADDC1(r512[4], t[1]);
UADDC1(r512[5], t[2]);
UADDC1(r512[6], t[3]);
UADD1(r512[7], t[4]);
// Reduce from 512 to 257
Load256(t, (&r512[4]));
if (!_IsZero(t)) {
ModMultOrder(t, N);
ModAdd256Order(t, r512, t);
Load256(r, t);
} else {
Load256(t, r512);
SubOrder(t);
if (_IsPositive(t)) {
Load256(r, t);
} else {
Load256(r, r512);
}
}
}
As seen in the code, most numbers work normally, but the number of recursions is a bit problematic and needs further optimization.
I need to perform order multiplication calculations on elliptic curves in CUDA. Currently, there is a ModMult function in CUDA as shown below, but it performs multiplication mod P, not order N. How should I modify it to calculate multiplication mod order N?