blaksmit / issue-test

0 stars 0 forks source link

Upgrade contiv version will cause network outage #37

Open blaksmit opened 6 years ago

blaksmit commented 6 years ago

Description

Whenever user upgrade contiv version, it will introduce network outage. This is because OVS is bundled inside the netplugin container.

This ticket is to investigate if we can put OVS into another container so that upgrading netplugin version will become seamless.

Acceptance Criteria

[CNTV-129] created by kahou.lei

blaksmit commented 6 years ago

With the fail-mode set to "secure" (which is what we want), killing ovs-vswitchd seems to start dropping all packets that OVS manages (not just new flows). This means that whenever we upgrade/restart ovs-vswitchd, there will be some downtime. In the absence of containers, when ovs is just an OS package, this is very minor because the new binaries are installed, and the service is restarted very quickly. With containers, it becomes a bit more complicated. Depending on how the new container is deployed, it's possible for the old version of the service to be down for some time.

I did some quick testing with trying to run two ovs-vswitchd processes on the same host. The second one I started logged "another ovs-vswitchd process is running, disabling this process (pid 7850) until it goes away". As soon as I killed the old one, the new one immediately took over. This means if we can have the new container deployed (and started) before the old one is destroyed, we shouldn't have noticeable downtime.

Looking at the way ovs gets started in the container is a little troubling to me. Both the vswitchd and database servers get started in the background, with nothing monitoring their status. Because of this, they will not get restarted if somehow they die.

I think we should add 2 new containers: One for ovsdb-server, and one for ovs-vswitchd. By having only one process per container, they can be restarted automatically if there is a problem. As long as we bind mount the appropriate directories into the appropriate containers, everything should be able to talk appropriately. ovsdb-server will need /etc/openvswitch and /var/run/openvswitch mounted from the host (technically we could change those paths, but I'm not sure if there's a good reason to). ovs-vswitchd and netplugin will need /var/run/openvswitch. As far as logs go, I'm a bit concerned that we seem to be logging to /var/log/ without setting up appropriate log rotation. I think we should just be logging to stdout/stderr, and have docker deal with it. This becomes a lot easier if we change to running each process in a separate container.

by nibartos

blaksmit commented 6 years ago

Anne McCormick It's not finished, but I did get a reasonable amount done on this. Here is a link to my branch: https://github.com/nbartos/contiv-netplugin/commits/ovs

by nibartos