add parallelization - Githubissues

This repo, as it stands, is full of "embarassingly parallel" problems which I have not yet ventured to parallelize.

Most obviously, the t random walks mandated by signing are independent of each other (as is their verification, though this is much faster). The construction during preprocessing, of the l_i^{e_i}-torsion bases are also independent of each other (though see also issue https://github.com/benediamond/GPSPostQuantum/issues/1).

A major bottleneck during signing is the loop of Algorithm 3, lines 7-10; see also the implementation https://github.com/benediamond/GPSPostQuantum/blob/9e6f7acdb8087a8407cd72e400d0bdb6592107ec/walker.cc#L93-L110 As each iteration represents a random (i.e., Las Vegas) trial, these can also be effortlessly parallelized. This would significantly speed up signing.

Update: I have tried using a basic c++ async design pattern to parallelize the above block. as in:

    auto f = [&I_i_minus_1, l = P.l, le = P.le, &p_coeff, &B_Q_i]() {
        base_vector<bigint> a_i_temp(4);
        while (true) {
            for (int i = 0; i < 4; i++)
                a_i_temp[i].assign(randomize(bigint(le)));
            a_i_temp.assign((I_i_minus_1 * bigint_matrix(a_i_temp))(0)); // get 0th column
            bigint norm = a_i_temp[0] * a_i_temp[0] + a_i_temp[0] * a_i_temp[3] +
                          a_i_temp[1] * a_i_temp[1] + a_i_temp[1] * a_i_temp[2] +
                          p_coeff * (a_i_temp[2] * a_i_temp[2] + a_i_temp[3] * a_i_temp[3]);
            if (norm % le == 0 && norm % (le * l) != 0)
                break;
        }

        if ((a_i_temp[0] % le * B_Q_i[0] + a_i_temp[1] % le * B_Q_i[1] +
             a_i_temp[2] % le * B_Q_i[2] + a_i_temp[3] % le * B_Q_i[3])
                .is_zero())
            return make_pair(true, a_i_temp);
        else
            return make_pair(false, a_i_temp);
    };
    while (true) {
        bool found = false;
        vector<future<pair<bool, base_vector<bigint>>>> futures; // use vector of (smart) pointers?
        for (int i = 0; i < P.le; i++)
            futures.push_back(async(f));
        for (int i = 0; i < P.le; i++) {
            pair<bool, base_vector<bigint>> result(futures[i].get());
            if (get<0>(result)) {
                a_i.assign(get<1>(result));
                found = true;
                break;
            }
        }
        if (found)
            break;
    }

However, this causes very, very strange things to happen in LiDIA... I am not sure why, because no shared memory is being written to (only read). Anybody who can explain / fix this would be greatly appreciated...

benediamond / galbraith-petit-silva

add parallelization #4