frenetic-lang / pyretic

The Pyretic language and runtime system
http://frenetic-lang.org/pyretic/
159 stars 99 forks source link

pingall failing on medium-sized virtualized topologies #16

Closed joshreich closed 11 years ago

joshreich commented 11 years ago

./pyretic/mininet.sh --switch ovsk --topo=clique,5,5 pox.py --no-cli pyretic/examples/virtualize.py --program=pyretic/examples/learning_switch.py --virttopo=pyretic/virttopos/bfs.py

pingall fails. likely related are link timeout/link detected messages

joshreich commented 11 years ago

it appears that ping breaks when actual topology view provided by pox is incorrect (e.g., missing links) and/or changing rapidly. The following settings in pox/openflow/discovery.py raise the size of the topology we can handle from a 4-clique to a 6-clique. However, pox/ovsk stops collecting a stable topology view by 7-clique (we tradeoff stability of topology view w/ freshness of that view). Ideally, we'd want a way of actively determining when links fail as opposed to assuming they've timed-out when we fail to receive LLDP responses. That said it is odd that pox is having trouble keeping track of O(7^2) links - seems a bit small...

LLDP_SEND_CYCLE = 4.0 TIMEOUT_CHECK_PERIOD = 4.0 LINK_TIMEOUT = 12.0

joshreich commented 11 years ago

more digging reveals that one part of the problem is that while starting up mininet sends out some packets that break pox. by waiting until mininet is fully spun up before starting pox, we can accurately learn topologies as large as 20-clique - although pingall on that topology dies when connections to the switches start dying - and have verified we can run w/o problem on 15-clique topologies w/ one code fix. Our code would break when doing calculations that involved interior location detection when topology changes were rapid. This would occur specifically when looping over self[sw].itervalues() during which the dict would change. This was fixed by using .values() instead, although it is possible now that results provided by interior_locations are stale.

princedpw commented 11 years ago

Pox seems kinda sucky if you dont mind me saying so.

On Jan 24, 2013, at 3:19 AM, joshreich notifications@github.com wrote:

more digging reveals that one part of the problem is that while starting up mininet sends out some packets that break pox. by waiting until mininet is fully spun up before starting pox, we can accurately learn topologies as large as 20-clique - although pingall on that topology dies when connections to the switches start dying - and have verified we can run w/o problem on 15-clique topologies w/ one code fix. Our code would break when doing calculations that involved interior location detection when topology changes were rapid. This would occur specifically when looping over self[sw].itervalues() during which the dict would change. This was fixed by using .values() instead, although it is possible now that results provided by interior_locations are stale.

— Reply to this email directly or view it on GitHub.

joshreich commented 11 years ago

Not at all - it's telling that the best thing I have to say about POX is that it isn't NOX ;-) Though I hadn't had firsthand experience w/ POX then, my suspicion that this would be the case was one of the reasons I had been pushing during last SIGCOMM, and occasionally since then, to kill the standalone python distribution that used a separate backend and instead to implement a pythonic interface on top of mainline frenetic.

On Jan 24, 2013, at 1:30 AM, David Walker notifications@github.com wrote:

Pox seems kinda sucky if you dont mind me saying so.

On Jan 24, 2013, at 3:19 AM, joshreich notifications@github.com wrote:

more digging reveals that one part of the problem is that while starting up mininet sends out some packets that break pox. by waiting until mininet is fully spun up before starting pox, we can accurately learn topologies as large as 20-clique - although pingall on that topology dies when connections to the switches start dying - and have verified we can run w/o problem on 15-clique topologies w/ one code fix. Our code would break when doing calculations that involved interior location detection when topology changes were rapid. This would occur specifically when looping over self[sw].itervalues() during which the dict would change. This was fixed by using .values() instead, although it is possible now that results provided by interior_locations are stale.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.