atlanticwave-sdx / atlanticwave-proto

Repo for work on prototype for AtlanticWave/SDX
5 stars 6 forks source link

JENKINS - Set up Jenkins #111

Open sdonovan1985 opened 5 years ago

sdonovan1985 commented 5 years ago

Basic setting up of Jenkins and whatever infrastructure (docker images, configuration scripts, etc.) are necessary. First, we're going to do this on a local VM, then move it to the RNOC infrastructure once I get the hang of how to use it. .

sdonovan1985 commented 5 years ago

I fixed the environment issue: sudo wasn't passing through PYTHONPATH correctly. Fixed that in the Jenkins Build configuration.

Current problem (that appeared once PYTHONPATH was fixed): The OVS switch instance isn't being created. The DB is being updated, so ovs-vsctl show does work, but when you run ovs-ofctl show br_ovs it fails. Below are details:

root@955bfc3d3cc9:/home/jenkins# ovs-ofctl show br_ovs
ovs-ofctl: br_ovs is not a bridge or a socket
root@955bfc3d3cc9:/home/jenkins# ovs-vsctl show
c6d2b3e7-94e6-44aa-b8b7-0cd045059890
    Manager "ptcp:6640"
    Bridge br_ovs
        Port br_ovs
            Interface br_ovs
                type: internal
    ovs_version: "2.6.2"
root@955bfc3d3cc9:/home/jenkins# ovs-ofctl show br_ovs
ovs-ofctl: br_ovs is not a bridge or a socket
root@955bfc3d3cc9:/home/jenkins# 

Below are the errors from creating the OVS:

root@955bfc3d3cc9:/home/jenkins# tail /var/log/openvswitch/ovs-vswitchd.log
2019-08-21T15:36:54.028Z|00008|memory|INFO|4416 kB peak resident set size after 176.7 seconds
2019-08-21T15:36:54.029Z|00009|dpif|WARN|failed to create datapath ovs-system: Operation not permitted
2019-08-21T15:36:54.029Z|00010|ofproto_dpif|ERR|failed to open datapath of type system: Operation not permitted
2019-08-21T15:36:54.029Z|00011|ofproto|ERR|failed to open datapath br_ovs: Operation not permitted
2019-08-21T15:36:54.029Z|00012|bridge|ERR|failed to create bridge br_ovs: Operation not permitted

Chasing this one down today.

sdonovan1985 commented 5 years ago

Found the solution thanks to https://mail.openvswitch.org/pipermail/ovs-discuss/2015-February/036452.html - I just needed to run the container as privileged. No code changes needed.

sudo docker run --name ssh-slave --network jenkins-test-network --privileged sdonovan:ssh-slave-mod "ssh-rsa <SSHKEY> jenkins"

sdonovan1985 commented 5 years ago

So the checkins yesterday cleaned up a few of the failures (3 of 20) that I was seeing. All of these tests work just fine in isolation. These three issues were due to the cleanup of virtual switches usually.

Now, the rest of the errors are all connected: once I fix whatever the cause is, all of the errors will go away. Like I said, these tests run just fine in isolation.


sudo nosetests test_LocalController.py test_RyuTranslateInterface.py

This is the minimal reproducible command to reproduce the 17 (or 16 now?) failure case.

sudo nosetests test_LocalController.py; sudo nosetests test_RyuTranslateInterface.py

This, strangely, works. Notice that it's just splitting the two sets of tests into two processes. Weird.

sudo nosetests test_RyuTranslateInterface.py test_LocalController.py

Notice, that this is just the same tests, reversed. This works just fine.

So, there's something fishy w/r/t timing(?) or ordering or some sort of state that I haven't figured out yet.

sdonovan1985 commented 5 years ago

sudo nosetests test_LocalController.py:LocalControllerTest.test_rule_installation_4 test_RyuTranslateInterface.py:RyuTranslateTests.test_trans_match_multi

Minimal reproduction.

sdonovan1985 commented 5 years ago

Huh. So I've been beating this all day, trying different little things, trying to figure out what's going awry.

What I've found is that after the LC test is run, during the RyuTranslateTest, either Ryu doesn't seem to be running OR the virtual switch is not connecting to Ryu. That's next to find out.

sdonovan1985 commented 5 years ago

Hmm... progress? Information?

root     29497 16.0  0.7 298504 56528 ?        Ss   15:42   0:00 /bin/python /bin/ryu-manager --app-list /home/sdx/dev/localctlr/RyuTranslateInterface.py --log-dir . --log-file ryu.log --verbose --ofp-tcp-listen-port 6633 --atlanticwave-lcname atl --atlanticwave-conffile /home/sdx/dev/localctlr/tests/rtitest.manifest
[sdx@localhost tests]$ sudo kill -9 29497
[sdx@localhost tests]$ ps aux | grep ryu-manager | grep -v grep
root     29497  2.9  0.0      0     0 ?        Zs   15:42   0:00 [ryu-manager] <defunct>
[sdx@localhost tests]$ ps aux | grep ryu-manager | grep -v grep
root     29497  2.7  0.0      0     0 ?        Zs   15:42   0:00 [ryu-manager] <defunct>
[sdx@localhost tests]$ ps aux | grep ryu-manager | grep -v grep
root     29497  2.0  0.0      0     0 ?        Zs   15:42   0:00 [ryu-manager] <defunct>
[sdx@localhost tests]$ ps aux | grep ryu-manager | grep -v grep

Line 1 - ps - is during the LocalController test.
Line 2 - kill -9 - is after LocalController test has finished cleanup (during a sleep() period) Lines 3,4,5 - During the RyuTranslateInterface test Line 6 - including the empty line - is after the test is finished.

So, defunct is described in the ps man page:

       Processes marked <defunct> are dead processes (so-called "zombies")
       that remain because their parent has not destroyed them properly.
       These processes will be destroyed by init(8) if the parent process
       exits.

So, the final question is how do I handle this part?

sdonovan1985 commented 5 years ago

Ok, down to three errors with Jenkins. My manual tests were done in the order below:

export PYTHONPATH=.:/home/sdx/mininet:/home/sdx/dev:/home/sdx/ryu; sudo pkill ryu-manager; sudo mn -c; sudo nosetests test_LocalController.py test_RyuControllerInterface.py test_RyuTranslateInterface.py

and then

export PYTHONPATH=.:/home/sdx/mininet:/home/sdx/dev:/home/sdx/ryu; sudo pkill ryu-manager; sudo mn -c; sudo nosetests 

Have to look at the (huge) logs from Jenkins to find out where the issues are coming from. I'm wondering if I need to replicate the little loop I have for finding out of the switch is connected to the LC, since it does take a variable, and rather unpredictable, amount of time to reconnect to Ryu.

sdonovan1985 commented 5 years ago

Not sure where to put this in the codebase, but attaching a pared down Jenkins configuration script. All tests (finally) successfully work in my local instance. All 262 tests passing. 67% of conditionals.

jenkinsconfig.txt

Need to get this running on the RNOC cloud infrastructure next.