Closed jrabek closed 9 years ago
Note that if I turn off the shaping in the demo ui, the wget transfer is fast again.
I ran atcd from the shell and captured the logs (below) when I applied the DSL profile.
INFO:AtcdVService.AtcdLinuxShaper:Request startShaping TrafficControl(device=TrafficControlledDevice(controllingIP='100.64.33.101', controlledIP='100.64.33.101'), timeout=86400, settings=TrafficControlSetting(down=Shaping(loss=Loss(percentage=0.0, correlation=0.0), delay=Delay(delay=5, jitter=0, correlation=0.0), rate=2000, iptables_options=[], corruption=Corruption(percentage=0.0, correlation=0.0), reorder=Reorder(percentage=0.0, correlation=0.0, gap=0)), up=Shaping(loss=Loss(percentage=0.0, correlation=0.0), delay=Delay(delay=5, jitter=0, correlation=0.0), rate=256, iptables_options=[], corruption=Corruption(percentage=0.0, correlation=0.0), reorder=Reorder(percentage=0.0, correlation=0.0, gap=0))))
INFO:AtcdVService.AtcdLinuxShaper:Shaping ip 100.64.33.101 on interface eth0
INFO:AtcdVService.AtcdLinuxShaper:create new HTB class on IFID eth0, classid 1:2,parent 1:0, rate 256kbits
INFO:AtcdVService.AtcdLinuxShaper:create new Netem qdisc on IFID eth0, parent 1:2, loss 0.0%, delay 5000
INFO:AtcdVService.AtcdLinuxShaper:create new FW filter on IFID eth0, classid 1:2, handle 2, rate: 256kbits
INFO:AtcdVService.AtcdLinuxShaper:Running /sbin/iptables -t mangle -A FORWARD -d 100.64.33.101 -i eth0 -j MARK --set-mark 2
INFO:AtcdVService.AtcdLinuxShaper:Shaping ip 100.64.33.101 on interface eth1
INFO:AtcdVService.AtcdLinuxShaper:create new HTB class on IFID eth1, classid 1:2,parent 1:0, rate 2000kbits
INFO:AtcdVService.AtcdLinuxShaper:create new Netem qdisc on IFID eth1, parent 1:2, loss 0.0%, delay 5000
INFO:AtcdVService.AtcdLinuxShaper:create new FW filter on IFID eth1, classid 1:2, handle 2, rate: 2000kbits
INFO:AtcdVService.AtcdLinuxShaper:Running /sbin/iptables -t mangle -A FORWARD -s 100.64.33.101 -i eth1 -j MARK --set-mark 2
After capturing these logs I reran
wget -O - www.cnn.com > /dev/null
and still saw the extremely slow transfer.
I looked through some of the closed issues and noticed that someone else saw something similar but not the same here: https://github.com/facebook/augmented-traffic-control/issues/86#issuecomment-88655287
@jrabek what happen if you run an end to end test with hosts directly before and after ATC?
Do you want to try https://github.com/facebook/augmented-traffic-control/issues/86#issuecomment-90631200 ?
@chantra Thanks! I cherry-picked the commit and it seems to resolve the issue. I confirmed using the same test I described previously in this bug.
commit cee20a691b361c81ccb163db55caa129c040a9c5
Author: chantra <chantra@fb.com>
Date: Sun Apr 5 13:18:15 2015 -0700
Command line argument to buffer oackets instead of dropping
atcd --atcd-dont-drop-packets
Any reason why this has not been merged into master yet and included by default in the arguments to atcd? My set up is vanilla out-of-the-box so I am surprised more people aren't hitting it.
I should clarify that after cherry-picking the commit, I completely recreated the trusty instance, ssh'd into trusty, stopped atcd, and added the --atcd-dont-drop-packets option and restarted atcd before retesting. Wanted to make sure that someone reading the bug didn't think that just cherry-picking the commit is enough.
@jrabek well, I somehow plan to review that part of the code when I get time for it and so I did not want to land something that may change in the future.
@chantra, sounds good. This can be resolved then. Thanks again for your help. I went ahead and forked the repo (https://github.com/airtimemedia/augmented-traffic-control) so we can have something that works out of the box for us.
Just as a note, I think what is happening might be related to this comment from https://github.com/facebook/augmented-traffic-control/pull/125#issuecomment-109472048:
1) You assume a policing behavior with no buffer. While an unlimited buffer is not realistic, policing is not that common either, and you end-up having TCP collapsing.
@chantra @zfjagann, so in its current state is ATC usable and accurate for Facebook? I ask since currently TCP flows seem to be broken if shaping is enabled and if the following commit is used as suggested in this bug then TCP works but the network delays grows continually (which makes sense since there is no dropping).
commit cee20a691b361c81ccb163db55caa129c040a9c5
Author: chantra <chantra@fb.com>
Date: Sun Apr 5 13:18:15 2015 -0700
Command line argument to buffer oackets instead of dropping
atcd --atcd-dont-drop-packets
Are there any other fixes or workarounds until issue #60 is fixed that would allow ATC to properly shape TCP flows without breaking them?
Whatever packet dropping being done doesn't seem to be correct.
Thanks again for the tool and open sourcing it. Please let me know if there are any packet captures that would be useful.
@jrabek there is no quick fix/workaround currently.
As much as it may not be super accurate, we have a bunch of profiles that are used to be representative of some of the situations we are trying to emulate.
@chantra, so I just got an interesting result and may have reopened the bug too soon.
Ignoring the accuracy issues, I previously had problems with TCP transfers not working when using the vagrant setup.
I subsequently set up ATC on a linux box in a configuration that matches the one mentioned in the main ATC README. The wget transfers seem to work in the bare metal configuration. May be some issue with the vagrant set up.
I'll update the bug title.
Oh yeah, vagrant (virtualbox) is not recommended... the main issue is most likely due to scheduling in the VM that will differ from bare metal.
This has been brought up a few times already. As much as vagrant is convenient to set up a dev environment, virtualization is not going to provide accurate scheduling.
this is definitely not something we can support though.
Makes total sense to not support it and just recommend a bare metal setup in the main README. That said, I think having a huge disclaimer in the README would be helpful since there is a vagrant setup provided. It made it seem like vagrant would be a valid option for evaluating/using ATC.
Thanks again for the quick responses.
thats fair. I think I did somewhere... but will doublecheck
:metal:
Setup:
On Macbook Pro running OSX 10.10.3
ATC using following commit
Runing ATC using Vagrant:
Running ATC client using Vagrant
Run wget as a sanity check to check connectivity and unshaped network speed:
In the ATC webui on 100.64.33.3:8000/atc_demo_ui/ on the atcclient01 instance, set the profile to anything including DSL or Cable and then repeat the same transfer:
No matter what the profile, the transfer rate is slowed to a crawl.
I did a quick sanity check on the ATC vagrant instance to make sure the wget network traffic is going through the ATC instance and it is.