Latency grows with the number of GGM trees, aka O(T)

weikengchen commented 3 years ago

This is a follow-up to #7 regarding the phenomenon that when the network latency goes up, the running time also goes up.

I used the following parameters adapted from Ferret:

const static int N_REG_Fp = 10608640;
const static int T_REG_Fp = 1295;
const static int K_REG_Fp = 589824;
const static int BIN_SZ_REG_Fp = 13;
const static int N_PRE_REG_Fp = 649728;
const static int T_PRE_REG_Fp = 1269;
const static int K_PRE_REG_Fp = 36288;
const static int BIN_SZ_PRE_REG_Fp = 9;
const static int N_PRE0_REG_Fp = 51200;
const static int T_PRE0_REG_Fp = 800;
const static int K_PRE0_REG_Fp = 5060;
const static int BIN_SZ_PRE0_REG_Fp = 6;

Network bandwidth 2Gbps with 20ms RTT. 32 threads.

I change T_REG_Fp along with BIN_SZ_REG_Fp. So, I have the following results.

T = 1295, BIN = 13 17.3s

T = 2590, BIN = 12 21.9s

T = 5180, BIN = 11 28.6s

T = 10360, BIN = 10 41.7s

T = 20720, BIN = 9 68.0s

Note that this experiment is not nicely controlled, as the network and computation also go up. So, I also do it with a lower RTT, 4ms.

T = 1295, BIN = 13 10.8s

T = 2590, BIN = 12 11.8s

T = 5180, BIN = 11 13.1s

T = 10360, BIN = 10 15.9s

T = 20720, BIN = 9 21.5s

An important discovery is that, if we disable the malicious checks, the time goes down significantly.

T = 1295, BIN = 13 RTT = 4ms: 9.3s RTT = 40ms: 10.6s

T = 20720, BIN = 9 RTT = 4ms: 10.0s RTT = 40ms: 11.3s

A PR will be pushed soon to fix this issue.

What happens is that, in SPCOT,

Alice is the sender of the GGM tree, Bob is the receiver of the GGM tree
Alice is the receiver of the checking challenge, Bob is the sender of the checking challenge

Therefore, on each thread, there are essentially O(T) rounds.

wangxiao1254 commented 3 years ago

Actually, why there will be T rounds in each thread? I thought there should be O(1) round in each thread and all threads are in parallel.

weikengchen commented 3 years ago

Let me correct that it would be T / threads rounds but still O(T).

The reason is that after splitting the job of T into each thread, each thread does the T / threads one by one.

for (auto i = start; i < end; ++i) {
    if(party == ALICE) {
        ggm_tree[i] = sparse_vector+i*leave_n;
        senders[i]->compute(ggm_tree[i], secret_share_x, triple_yz[i]);
        senders[i]->template send<OTPre<IO>>(ot, ios[start/width], i);
        if(is_malicious) senders[i]->consistency_check_msg_gen(check_VW_buf[i], ios[start/width]);
        ios[start/width]->flush();
    } else {
        recvers[i]->template recv<OTPre<IO>>(ot, ios[start/width], i);
        ggm_tree[i] = sparse_vector+i*leave_n;
        recvers[i]->compute(ggm_tree[i], triple_yz[i]);
        if(is_malicious) recvers[i]->consistency_check_msg_gen(check_chialpha_buf[i], check_VW_buf[i], ios[start/width], triple_yz[i]);
        ios[start/width]->flush();
    }
}

Ideally, this should still be O(1) round, since we should have the sender always be sending and the receiver always be receiving. In that case, the "round" disappears, as it is merely a stream.

But the problem is that, in this step:

if(is_malicious) senders[i]->consistency_check_msg_gen(check_VW_buf[i], ios[start/width]);

The sender is trying to receive something from the receiver, so the sender is now waiting for the receiver to get back.

For that to happen, the sender needs to wait for the receiver to receive the GGM tree data in

senders[i]->template send<OTPre<IO>>(ot, ios[start/width], i);

So the sender at least waits for the network latency. This is bad because recall that each thread does the job one by one, so each thread essentially waits for network_latency * T / threads.

weikengchen commented 3 years ago

In other words, ideally, when the sender finishes sending the GGM tree info, the sender should immediately go to the next tree and start computing the GGM tree, but now, the sender actually sits there and waits for the challenge for some RTT, not doing the next tree.

weikengchen commented 3 years ago

Adding for reference, in SPCOT SENDer side,

void consistency_check_msg_gen(__uint128_t& V, IO *io2) {
    block seed;

    /************/
    io2->recv_data(&seed, sizeof(block));
    /************/

    __uint128_t *chi = new __uint128_t[leave_n];
    Hash hash;
    __uint128_t digest = mod(_mm_extract_epi64(hash.hash_for_block(&seed, sizeof(block)), 0));
    uni_hash_coeff_gen(chi, digest, leave_n);

    // V = \sum{chi_i*v_i}
    V = vector_inn_prdt_sum_red(chi, (__uint128_t*)ggm_tree, leave_n);

    delete[] chi;
}

It is receiving.

wangxiao1254 commented 3 years ago

You are right! got it!

emp-toolkit / emp-zk

Latency grows with the number of GGM trees, aka O(T) #9