Open benediamond opened 5 years ago
Update: I have tried using a basic c++ async design pattern to parallelize the above block. as in:
auto f = [&I_i_minus_1, l = P.l, le = P.le, &p_coeff, &B_Q_i]() {
base_vector<bigint> a_i_temp(4);
while (true) {
for (int i = 0; i < 4; i++)
a_i_temp[i].assign(randomize(bigint(le)));
a_i_temp.assign((I_i_minus_1 * bigint_matrix(a_i_temp))(0)); // get 0th column
bigint norm = a_i_temp[0] * a_i_temp[0] + a_i_temp[0] * a_i_temp[3] +
a_i_temp[1] * a_i_temp[1] + a_i_temp[1] * a_i_temp[2] +
p_coeff * (a_i_temp[2] * a_i_temp[2] + a_i_temp[3] * a_i_temp[3]);
if (norm % le == 0 && norm % (le * l) != 0)
break;
}
if ((a_i_temp[0] % le * B_Q_i[0] + a_i_temp[1] % le * B_Q_i[1] +
a_i_temp[2] % le * B_Q_i[2] + a_i_temp[3] % le * B_Q_i[3])
.is_zero())
return make_pair(true, a_i_temp);
else
return make_pair(false, a_i_temp);
};
while (true) {
bool found = false;
vector<future<pair<bool, base_vector<bigint>>>> futures; // use vector of (smart) pointers?
for (int i = 0; i < P.le; i++)
futures.push_back(async(f));
for (int i = 0; i < P.le; i++) {
pair<bool, base_vector<bigint>> result(futures[i].get());
if (get<0>(result)) {
a_i.assign(get<1>(result));
found = true;
break;
}
}
if (found)
break;
}
However, this causes very, very strange things to happen in LiDIA... I am not sure why, because no shared memory is being written to (only read). Anybody who can explain / fix this would be greatly appreciated...
This repo, as it stands, is full of "embarassingly parallel" problems which I have not yet ventured to parallelize.
Most obviously, the t random walks mandated by signing are independent of each other (as is their verification, though this is much faster). The construction during preprocessing, of the l_i^{e_i}-torsion bases are also independent of each other (though see also issue https://github.com/benediamond/GPSPostQuantum/issues/1).
A major bottleneck during signing is the loop of Algorithm 3, lines 7-10; see also the implementation https://github.com/benediamond/GPSPostQuantum/blob/9e6f7acdb8087a8407cd72e400d0bdb6592107ec/walker.cc#L93-L110 As each iteration represents a random (i.e., Las Vegas) trial, these can also be effortlessly parallelized. This would significantly speed up signing.