alibaba-edu / High-Precision-Congestion-Control

296 stars 153 forks source link

The parameters for DCQCN #21

Open ChengjunJia opened 3 years ago

ChengjunJia commented 3 years ago

Hi, Yuliang:

I have a problem with DCQCN's parameters. From the configuration in run.py and the description in your paper:

Kmin = 100KB × Bw/25Gbps and Kmax = 400KB × Bw/ 25Gbps according to our experiences (no vendor suggestion available). For DCTCP, we set Kmin = Kmax = 30KB × Bw/10Gbps according to [8].

For 100Gbps links, we set Kmin=400KB, Kmax=1600KB. But in third.cc, there is headroom as only 3*BDP, smaller than Kmax?

uint32_t headroom = rate * delay / 8 / 1000000000 * 3;
std::cout << "switch head room size: " << headroom << std::endl;
sw->m_mmu->ConfigHdrm(j, headroom);

Is it too large for DCQCN Kmin in your experiment? I see in DCQCN paper, the parameters are Kmin=4KB, Kmax=200KB; while in DCTCP paper, the parameter is K=60KB. I think lower K will make the average queue shallower and reduce the FCT for flows whose size is smaller than BDP.

Why do you set the K value much larger? Is there anything I missed?

liyuliang001 commented 3 years ago

The 100KB, 400KB, pmax=20 is from Alibaba's production. The parameters in DCQCN paper does not work for production.

Also, please note that K should be proportional to the link bandwidth. DCQCN paper uses 40Gbps, while DCTCP only uses 10Gbps.

ChengjunJia commented 3 years ago

Thanks for the answer. In DCTCP, the latency is ~100us, but nowadays, the latency seems ~10us. I think K should be proportional to BDP, but you mean that it should be proportional to bandwidth?

liyuliang001 commented 3 years ago

Yes. K should be proportional to BDP. (I meant proportional to bandwidth, and proportional to latency)

ChengjunJia commented 3 years ago

I tried to set DCTCP's K as 60KB as recommended in the DCTCP paper, and I found that it improved the FCT of short flows without much harm to long flows. HPCC is best for FCTs of short flows but does it harm the FCTs of long flows? I mean whether HPCC sacrifice a little bandwidth to keep the near-zero queue? Have you measured the related data?

Thanks a lot.