About the reached maximum throughput of Orca?

wen-zheng commented 4 years ago

@Soheil-ab

Hi, Soheil. I set the following network topology:

client ---------- router------------- server

and set the bottleneck bandwidth with 128Mbps, RTT with 150ms, and queue length with 1BDP at the router using Linux tc and netem. I found the maximum throughput of Orca is only ~60Mbps but pure cubic can reach 100+Mbps? Do I need to configure Orca specifically for this environment? (I have asked you how to run Orca without mahimahi.)

wen-zheng commented 4 years ago

My configurations: OS: ubuntu 16.04 CPU: 1 Ram: 1GB tcp_timestamps: disabled tcp offload: disabled net.core.rmem_max: 33554432 net.core.wmem_max: 33554432 net.core.rmem_default: 16777216 net.core.wmem_default:16777216 net.ipv4.tcp_wmem: '8388608 16777216 33554432' sysctl net.ipv4.tcp_rmem: '8388608 16777216 33554432' and I have comment the settings for wmem at setup.sh script.

Orca's overhead: ~5% CPU

Soheil-ab commented 4 years ago

@wen-zheng ,

Hi Wen-Zheng, it sounds like a problem with reading the correct learned model. Make sure that the provided model is not replaced with a new model. That can happen if you accidentally started a "learn from scratch" session. To see if everything is alright at the server side, first, use Mahimahi and create a bottleneck link similar to your topology. Then send traffic to see if the problem persists. If it does, then problem is with the model you're using, if not, we will look into other possible reasons.

Plz let me know the outcome.

(Just as a side note: Plz set followings: sudo sysctl -q net.ipv4.tcp_wmem="4096 32768 4194304" sudo sysctl -w -q net.ipv4.tcp_low_latency=1 sudo sysctl -w -q net.ipv4.tcp_autocorking=0 sudo sysctl -w -q net.ipv4.tcp_no_metrics_save=1 )

wen-zheng commented 4 years ago

@Soheil-ab

I have checked the md5 of the trained model at folder "models" and "rl-module/train_dir/learner0" which is the same with this repository.

And I restore the default settings (net.ipv4.tcp_wmem="4096 32768 4194304") but this issue remains.

I also found that the results of repeated experiments in the same environment (bottleneck bandwidth and RTT are fixed) can vary greatly.

Soheil-ab commented 4 years ago

@wen-zheng ,

OK. Now let's check other things.

Are you using VMs (on top of VMWare, VBox, ... )? Current implementation is not fully in-kernel meaning part of the system is still at app layer. In particular, information passing b.w. kernel, "c", and "python" is handled by timers at app layer. Consequently, if you are using a VM with very restricted resources (similar to your case), you will see that performance drops, because your host OS will schedule the VM tightly and that impacts the performance of the timers therein which will lead to results vary a lot even on the same setting. Besides, current implementation still uses Tensorflow which requires enough resources to perform its job smoothly.

So, if you can, avoid using VMs (Type 2 Hypervisors). Instead, for instance, use bare-metal Hypervisors if you can. If it's not the case, try to increase the resources (Specifically CPU and RAM) associated to the VM and see the impact on the results. In general, a low performing VM will greatly impact the performance including performance of TCP schemes (even the fully in-kernel ones like Cubic or BBR).

Plz let me know if it resolves the issue.

wen-zheng commented 4 years ago

@Soheil-ab

I do use VMs. And I allocate more resources to Orca's server and client (4 CPU, 2G RAM), but the above issues remain. I also test Cubic, BBR between Orca's server, client and their performance is very consistent.

My experiment results are attached to belows(Bottleneck: 15Mbps, 150 ms RTT, queue len 1BDP). cubic-trace-stats-2.txt orca-trace-stats-2.txt orca-trace-stats-1.txt bbr-trace-stats-1.txt bbr-trace-stats-2.txt cubic-trace-stats-1.txt

wen-zheng commented 4 years ago

Orca can perform well, but sometimes poor.

Soheil-ab commented 4 years ago

@wen-zheng,

Unfortunately, summary of a pcap file won't help debugging the system.

To see what is wrong, let's generate a reproducible scenario. To that end, please use Mahimahi, set the link BW to 12Mbps (a file with only a 1 on it can be used to generate a 12Mbps link in Mahimahi), RTT to 150ms, and queue size to 1BDP. Use your current setting and run a few tests with Orca over this environment and let me know the results.

Soheil-ab commented 4 years ago

I close this issue due to lack of any activities.

Soheil-ab / Orca

About the reached maximum throughput of Orca? #2