alibaba-edu / High-Precision-Congestion-Control

294 stars 152 forks source link

Regenerate fig 11a from HPCC paper #36

Open syzx12comcn opened 2 years ago

syzx12comcn commented 2 years ago

I'm trying to regenerate fig 11a , but some details are not clear. It is written in the paper that we "either add incast traffic to 30% load traffic or run 50% load traffic. We generate the incast traffic by randomly selecting 60 senders and one receiver, each sending 500KB." Firstly, I generate the 30% payload Facebook Hadoop traffic, there is about 100w flows. The paper doesn't say when the incast is added, so I casually added it to the 40wth line.But the result is not similar as paper's.

Two questions: 1.Does the timing of incast addition affect the results? 2.I found that 'run.py' provides the four kind of parameters of DCQCN algorithms. Which one is used in the simulation of paper?

Thanks for taking the time to answer the question!

liyuliang001 commented 2 years ago

The incast load is 2% of the network capacity, which means incast continuously appear in the network, not just one time. What I did was calculating the average inter incast arrival time to achieve 2% load. Then use a Poisson arrival of incast and every incast randomly picks one receiver and 60 senders.

On Fri, May 6, 2022 at 8:13 AM syzx12comcn @.***> wrote:

I'm trying to regenerate fig 11a , but some details are not clear. It is written in the paper that we "either add incast traffic to 30% load traffic or run 50% load traffic. We generate the incast traffic by randomly selecting 60 senders and one receiver, each sending 500KB." Firstly, I generate the 30% payload Facebook Hadoop traffic, there is about 100w flows. The paper doesn't say when the incast is added, so I casually added it to the 40wth line.But the result is not similar as paper's.

Two questions: 1.Does the timing of incast addition affect the results? 2.I found that 'run.py' provides the four kind of parameters of DCQCN algorithms. Which one is used in the simulation of paper?

Thanks for taking the time to answer the question!

— Reply to this email directly, view it on GitHub https://github.com/alibaba-edu/High-Precision-Congestion-Control/issues/36, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIWJVHBLFODFS4I2MKAOBLVIUZK5ANCNFSM5VIOX42Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

liyuliang001 commented 2 years ago

dcqcn and dcqcn_vwin were used in the HPCC paper.

On Fri, May 6, 2022 at 10:20 PM Yuliang Li @.***> wrote:

The incast load is 2% of the network capacity, which means incast continuously appear in the network, not just one time. What I did was calculating the average inter incast arrival time to achieve 2% load. Then use a Poisson arrival of incast and every incast randomly picks one receiver and 60 senders.

On Fri, May 6, 2022 at 8:13 AM syzx12comcn @.***> wrote:

I'm trying to regenerate fig 11a , but some details are not clear. It is written in the paper that we "either add incast traffic to 30% load traffic or run 50% load traffic. We generate the incast traffic by randomly selecting 60 senders and one receiver, each sending 500KB." Firstly, I generate the 30% payload Facebook Hadoop traffic, there is about 100w flows. The paper doesn't say when the incast is added, so I casually added it to the 40wth line.But the result is not similar as paper's.

Two questions: 1.Does the timing of incast addition affect the results? 2.I found that 'run.py' provides the four kind of parameters of DCQCN algorithms. Which one is used in the simulation of paper?

Thanks for taking the time to answer the question!

— Reply to this email directly, view it on GitHub https://github.com/alibaba-edu/High-Precision-Congestion-Control/issues/36, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIWJVHBLFODFS4I2MKAOBLVIUZK5ANCNFSM5VIOX42Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

syzx12comcn commented 2 years ago

My reproduced results are somewhat different from those in the paper. For example, for the small size flow, the FCT slowdown of DCQCN and TIMELY is much smaller than that in the paper(paper's result approaches 100+).

截屏2022-05-15 17 06 35

I think this is because of my Incast scale setting.Is my average interIncast arrival time calculated right?(In my experiments, incast appeared about 14 times)

截屏2022-05-15 17 00 08
syzx12comcn commented 2 years ago

The incast load is 2% of the network capacity, which means incast continuously appear in the network, not just one time. What I did was calculating the average inter incast arrival time to achieve 2% load. Then use a Poisson arrival of incast and every incast randomly picks one receiver and 60 senders. Is my average interIncast arrival time calculated right?

截屏2022-05-22 21 11 09

the generated flow contains about 240 times incast.

截屏2022-05-22 21 20 10
liyuliang001 commented 2 years ago

How long did you simulate?

To achieve 2% load, on average the number of incast per second is: 320 (hosts) 100Gbps 0.02 / (500KB60) = 2667.

syzx12comcn commented 2 years ago

How long did you simulate?

To achieve 2% load, on average the number of incast per second is: 320 (hosts) 100Gbps 0.02 / (500KB60) = 2667.

0.1s. Thank u! I have successfully reproduced Fig11a and c in the paper with your help. Another question, how about 'Delay' in Fig11b? Is this a packet-level feature? Should I use 'mix.tr' for calculations?

KlZOnmyway commented 2 years ago

How long did you simulate? To achieve 2% load, on average the number of incast per second is: 320 (hosts) 100Gbps 0.02 / (500KB60) = 2667.

0.1s. Thank u! I have successfully reproduced Fig11a and c in the paper with your help. Another question, how about 'Delay' in Fig11b? Is this a packet-level feature? Should I use 'mix.tr' for calculations?

Hi, I'm wondering what settings you have modified in third.cc, I have created a 0.1s trace with 100w flows named tmp_traffic.txt, and my execution command is like: _python run.py --cc hp --trace tmptraffic --bw 100 --topo fat --hpai 50 But it takes hours and cannot finish the simulation. There is no error reported in the terminal as well. I'd appreciate if you could give me some suggetions.

syzx12comcn commented 2 years ago

How long did you simulate? To achieve 2% load, on average the number of incast per second is: 320 (hosts) 100Gbps 0.02 / (500KB60) = 2667.

0.1s. Thank u! I have successfully reproduced Fig11a and c in the paper with your help. Another question, how about 'Delay' in Fig11b? Is this a packet-level feature? Should I use 'mix.tr' for calculations?

Hi, I'm wondering what settings you have modified in third.cc, I have created a 0.1s trace with 100w flows named tmp_traffic.txt, and my execution command is like: _python run.py --cc hp --trace tmptraffic --bw 100 --topo fat --hpai 50 But it takes hours and cannot finish the simulation. There is no error reported in the terminal as well. I'd appreciate if you could give me some suggetions.

100w flows takes about 5 hours on my computer

KlZOnmyway commented 2 years ago

How long did you simulate? To achieve 2% load, on average the number of incast per second is: 320 (hosts) 100Gbps 0.02 / (500KB60) = 2667.

0.1s. Thank u! I have successfully reproduced Fig11a and c in the paper with your help. Another question, how about 'Delay' in Fig11b? Is this a packet-level feature? Should I use 'mix.tr' for calculations?

Hi, I'm wondering what settings you have modified in third.cc, I have created a 0.1s trace with 100w flows named tmp_traffic.txt, and my execution command is like: _python run.py --cc hp --trace tmptraffic --bw 100 --topo fat --hpai 50 But it takes hours and cannot finish the simulation. There is no error reported in the terminal as well. I'd appreciate if you could give me some suggetions.

100w flows takes about 5 hours on my computer

Could you tell me your computer configuration? Are you running on a VM or local host? I ran it on a VM with 10GB ram and 16 cores, and it returned signal SIGKILL after few hours.

syzx12comcn commented 2 years ago

How long did you simulate? To achieve 2% load, on average the number of incast per second is: 320 (hosts) 100Gbps 0.02 / (500KB60) = 2667.

0.1s. Thank u! I have successfully reproduced Fig11a and c in the paper with your help. Another question, how about 'Delay' in Fig11b? Is this a packet-level feature? Should I use 'mix.tr' for calculations?

Hi, I'm wondering what settings you have modified in third.cc, I have created a 0.1s trace with 100w flows named tmp_traffic.txt, and my execution command is like: _python run.py --cc hp --trace tmptraffic --bw 100 --topo fat --hpai 50 But it takes hours and cannot finish the simulation. There is no error reported in the terminal as well. I'd appreciate if you could give me some suggetions.

100w flows takes about 5 hours on my computer

Could you tell me your computer configuration? Are you running on a VM or local host? I ran it on a VM with 10GB ram and 16 cores, and it returned signal SIGKILL after few hours.

I didn't use a VM, but run it on my PC. It does not require high computer configuration

KlZOnmyway commented 2 years ago

My reproduced results are somewhat different from those in the paper. For example, for the small size flow, the FCT slowdown of DCQCN and TIMELY is much smaller than that in the paper(paper's result approaches 100+). 截屏2022-05-15 17 06 35 I think this is because of my Incast scale setting.Is my average interIncast arrival time calculated right?(In my experiments, incast appeared about 14 times) 截屏2022-05-15 17 00 08

Hi, sorry to bother. Could you please share me your traffic generation code of fbHdp with 30%load and incast? I think I tried the similar thing as above, but dcqcn will always end up in: image my code is https://drive.google.com/file/d/1QS8-wxnltJZBlDd6Rr-JvvrtDUI5KD-A/view?usp=sharing Also, my pfc results for all these algorithms except timely are empty, did you meet similar problem before?

sana-mahmood commented 2 years ago

How long did you simulate? To achieve 2% load, on average the number of incast per second is: 320 (hosts) 100Gbps 0.02 / (500KB60) = 2667.

0.1s. Thank u! I have successfully reproduced Fig11a and c in the paper with your help. Another question, how about 'Delay' in Fig11b? Is this a packet-level feature? Should I use 'mix.tr' for calculations?

Hi, I am trying to regenerate fig 11c, but unable to do so for DCQCN and TIMELY. Would it be possible for you to share your configuration files for DCQCN and TIMELY (that you get from run.py)?