ctrautma / RHEL_NIC_QUALIFICATION

RHEL development NIC qualification scripts for partners
Apache License 2.0
12 stars 8 forks source link

Scapy dependency in vswitchperf #3

Open w1ldptr opened 6 years ago

w1ldptr commented 6 years ago

I have an issue running Perf-Verify.sh It fails with following log:

[ERROR]  2017-11-29 11:04:22,430 : (root) - Could not import file trex.py
[ERROR]  2017-11-29 11:04:22,430 : (root) - Failed to run test: pvp_tput
Traceback (most recent call last):
  File "./vsperf", line 688, in main
    test.run()
  File "/root/vswitchperf/testcases/testcase.py", line 330, in run
    self.run_initialize()
  File "/root/vswitchperf/testcases/testcase.py", line 198, in run_initialize
    loader.get_trafficgen_class())
  File "/root/vswitchperf/core/loader/loader.py", line 79, in get_trafficgen_class
    return self._trafficgen_loader.get_class()
  File "/root/vswitchperf/core/loader/loader_servant.py", line 67, in get_class
    class_name=self._class_name)
  File "/root/vswitchperf/core/loader/loader_servant.py", line 119, in load_module
    path=path, interface=interface)
  File "/root/vswitchperf/core/loader/loader_servant.py", line 142, in load_modules
    for _, mod in LoaderServant._load_all_modules(path):
  File "/root/vswitchperf/core/loader/loader_servant.py", line 181, in _load_all_modules
    modname, *imp.find_module(modname, [root]))
  File "/root/vsperfenv/lib64/python3.3/imp.py", line 175, in load_module
    return load_source(name, filename, file)
  File "/root/vsperfenv/lib64/python3.3/imp.py", line 114, in load_source
    _LoadSourceCompatibility(name, pathname, file).load_module(name)
  File "<frozen importlib._bootstrap>", line 586, in _check_name_wrapper
  File "<frozen importlib._bootstrap>", line 1024, in load_module
  File "<frozen importlib._bootstrap>", line 1005, in load_module
  File "<frozen importlib._bootstrap>", line 562, in module_for_loader_wrapper
  File "<frozen importlib._bootstrap>", line 870, in _load_module
  File "<frozen importlib._bootstrap>", line 313, in _call_with_frames_removed
  File "/root/vswitchperf/tools/pkt_gen/trex/trex.py", line 32, in <module>
    from trex_stl_lib.api import *
  File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/api.py", line 4, in <module>
    from .trex_stl_client import STLClient, LoggerApi
  File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/trex_stl_client.py", line 14, in <module>
    from .trex_stl_vlan import VLAN
  File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/trex_stl_vlan.py", line 8, in <module>
    from scapy.layers.l2 import Dot1Q, Dot1AD
ImportError: cannot import name Dot1AD

By default it installs Python packages according to file:

vswitchperf# cat requirements.txt

pexpect==3.3
tox==1.8.1
jinja2==2.7.3
xmlrunner==1.7.7
requests==2.8.1
netaddr==0.7.18
scapy-python3==0.18
pyzmq==14.5.0
distro

The Scapy dependency is on python3 Scapy fork called "scapy3k". As we can see in l2.py this fork doesn't have Dot1AD implemented. I manually installed upstream Scapy in vsperfenv with following command:

scl enable python33 "
virtualenv "/root/vsperfenv"
source /root/vsperfenv/bin/activate
cd /root/scapy
python setup.py install
"

However with upstream version script fails with following error:

Traceback (most recent call last):
  File "/root/vsperfenv/lib/python3.3/site-packages/scapy/packet.py", line 230, in __getattr__
    fld, v = self.getfield_and_val(attr)
TypeError: 'NoneType' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/vsperfenv/lib/python3.3/site-packages/scapy/packet.py", line 230, in __getattr__
    fld, v = self.getfield_and_val(attr)
TypeError: 'NoneType' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/vsperfenv/lib/python3.3/site-packages/scapy/packet.py", line 230, in __getattr__
    fld, v = self.getfield_and_val(attr)

It seems that this script and trex are dependent on very specific version of Scapy but not the one specified in requirements file.

ctrautma commented 6 years ago

It would appear something has changed in the master branch of T-Rex to now require the use of scapy Dot1AD which is not in the forked repository.

When the build_base_machine.sh scripts runs it builds the current master branch of T-Rex in the vswitchperf/src/trex so the code in vswitchper/tools/pkt_gen/trex/trex.py can import all libraries to make calls to the python API. The solution will probably be for now to do a checkout of previous version before building the trex libraries. This was working previously so something new has caused this.

I'll have to see if I can figure out what commit caused this in the T-Rex master branch. This is just my assumption right now as this was working a few weeks ago when I tested it.

ctrautma commented 6 years ago

So I looked into this more.

I cannot reproduce it on my systems so I'm trying to understand why you are seeing it when I am not. I did a clean install and re-ran everything from scratch and it runs just fine so something different between your system and mine is causing this problem.

Could you provide the following info

cd /root/vswithperf/tools/pkt_gen/trex scl enable python33 bash source ~/vspervenv/bin/activate

Open a python shell "python" Run the following commands import logging import subprocess import sys import time from collections import OrderedDict import netaddr import zmq sys.path.append("/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl") from trex_stl_lib.api import *

If this gives the same traceback as you reported initially then can you provide your install log? It would be located in a subfolder in /root/RHEL_NIC_QUAL_LOGS so you might have to search for it.

Its named vsperf_install.log

It is most likely located in the oldest dated folder in that location.

ctrautma commented 6 years ago

I did look at the building of T-Rex in vswitchperf and it does use a specific commit as I thought it might be using the master branch. So it is not a recent change to T-Rex that caused this. We've always used a specific commit when building the APIs. Are there any custom changes you are doing perhaps? Or did you just modify the Perf-Verify.conf file and run it? The only other thing that could be helpful would be to provide the full logs. The collection.sh script should grab everything needed.

w1ldptr commented 6 years ago

Hi Christian,

I did a bit of debugging of vswitchperf scripts. First of all I checked what you suggested with manually importing importing trex_stl_lib:

trex# scl enable python33 bash
[root@qa-h-vrt-069 trex]# source /root/vsperfenv/bin/activate
(vsperfenv)[root@qa-h-vrt-069 trex]# python
Python 3.3.2 (default, Aug 14 2014, 14:25:52)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> import subprocess
>>> import sys
>>> import time
>>> from collections import OrderedDict
>>> import netaddr
>>> import zmq
>>> sys.path.append("/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl")
>>> from trex_stl_lib.api import *
>>> os.path.abspath(scapy.__file__)
'/root/vswitchperf/src/trex/trex/scripts/external_libs/scapy-2.3.1/python3/scapy/__init__.py'
>>>

It worked! And Trex used custom Scapy version from its own "external_libs" directory. Since it was very confusing I decided to check what would happen if Scapy was imported before path change:

(vsperfenv)[root@qa-h-vrt-069 ~]# python
Python 3.3.2 (default, Aug 14 2014, 14:25:52)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> import subprocess
>>> import sys
>>> import time
>>> from collections import OrderedDict
>>> import netaddr
>>> import zmq
>>> import scapy
>>> sys.path.append("/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl")
>>> from trex_stl_lib.api import *
WARNING: No route found for IPv6 destination :: (no default route?). This affects only IPv6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/api.py", line 4, in <module>
    from .trex_stl_client import STLClient, LoggerApi
  File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/trex_stl_client.py", line 14, in <module>
    from .trex_stl_vlan import VLAN
  File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/trex_stl_vlan.py", line 8, in <module>
    from scapy.layers.l2 import Dot1Q, Dot1AD
ImportError: cannot import name Dot1AD
>>> import os
>>> os.path.abspath(scapy.__file__)
'/root/vsperfenv/lib/python3.3/site-packages/scapy/__init__.py'
>>>

So it seemed that correct Scappy version will not be imported if any version is already loaded. That got me thinking that some non-determinism is involved and that is why we observe different behavior on exactly same versions of code/dependencies. I decided to check who might be importing Scapy before Trex and added exception to Scapy __init__ in vsperfenv with following results:

[ERROR]  2017-12-02 18:46:23,140 : (root) - Failed to run test: pvp_tput
Traceback (most recent call last):
  File "./vsperf", line 688, in main
    test.run()
  File "/root/vswitchperf/testcases/testcase.py", line 330, in run
    self.run_initialize()
  File "/root/vswitchperf/testcases/testcase.py", line 198, in run_initialize
    loader.get_trafficgen_class())
  File "/root/vswitchperf/core/loader/loader.py", line 79, in get_trafficgen_class
    return self._trafficgen_loader.get_class()
  File "/root/vswitchperf/core/loader/loader_servant.py", line 67, in get_class
    class_name=self._class_name)
  File "/root/vswitchperf/core/loader/loader_servant.py", line 119, in load_module
    path=path, interface=interface)
  File "/root/vswitchperf/core/loader/loader_servant.py", line 142, in load_modules
    for _, mod in LoaderServant._load_all_modules(path):
  File "/root/vswitchperf/core/loader/loader_servant.py", line 181, in _load_all_modules
    modname, *imp.find_module(modname, [root]))
  File "/root/vsperfenv/lib64/python3.3/imp.py", line 175, in load_module
    return load_source(name, filename, file)
  File "/root/vsperfenv/lib64/python3.3/imp.py", line 114, in load_source
    _LoadSourceCompatibility(name, pathname, file).load_module(name)
  File "<frozen importlib._bootstrap>", line 586, in _check_name_wrapper
  File "<frozen importlib._bootstrap>", line 1024, in load_module
  File "<frozen importlib._bootstrap>", line 1005, in load_module
  File "<frozen importlib._bootstrap>", line 562, in module_for_loader_wrapper
  File "<frozen importlib._bootstrap>", line 870, in _load_module
  File "<frozen importlib._bootstrap>", line 313, in _call_with_frames_removed
  File "/root/vswitchperf/tools/pkt_gen/xena/xena.py", line 35, in <module>
    import scapy.layers.inet as inet
  File "/root/vsperfenv/lib/python3.3/site-packages/scapy/__init__.py", line 12, in <module>
    raise Exception('Trying to import wrong Scapy')
Exception: Trying to import wrong Scapy

So Xena was importing Scapy from environment before Trex. Then I had to find source of non-determinism. According to original error trace traffic generators are loaded with loader module instead of being imported directly. Since I'm not familiar with Python tooling I decided to use old good printf debugging and put some prints in loader module. I was specifically interested in following function:

    def _load_all_modules(path):
        """Load all modules from ``path`` directory.

        This is based on the design used by OFTest:
            https://github.com/floodlight/oftest/blob/master/oft

        :param path: Path to a folder of modules.

        :return: List of modules in a folder.
        """
        mods = []

        for root, _, filenames in os.walk(path):
            # Iterate over each python file
            for filename in fnmatch.filter(filenames, '[!.]*.py'):
                modname = os.path.splitext(os.path.basename(filename))[0]

                # skip module load if it is excluded by configuration
                if modname in settings.getValue('EXCLUDE_MODULES'):
                    continue

                try:
                    print("XXX modname ", modname)
                    if modname in sys.modules:
                        mod = sys.modules[modname]
                    else:
                        mod = imp.load_module(
                            modname, *imp.find_module(modname, [root]))
                except ImportError:
                    logging.error('Could not import file ' + filename)
                    raise

                mods.append((modname, mod))

        return mods

And got following results:

[DEBUG]  2017-12-03 20:36:32,850 : (testcases.performance) - Controllers:
XXX modname  __init__
XXX modname  __init__
XXX modname  trafficgen
XXX modname  __init__
XXX modname  ixnet
XXX modname  __init__
XXX modname  testcenter-rfc2889-rest
XXX modname  testcenter
XXX modname  testcenter-rfc2544-rest
XXX modname  ixia
XXX modname  __init__
XXX modname  XenaDriver
XXX modname  __init__
XXX modname  xena
XXX modname  xena_json_mesh
XXX modname  xena_json_blocks
XXX modname  xena_json
XXX modname  xena_json_pairs
XXX modname  __init__
XXX modname  json_utilities
XXX modname  __init__
XXX modname  dummy
XXX modname  trex
WARNING: No route found for IPv6 destination :: (no default route?). This affects only IPv6
[WARNING]  2017-12-03 20:36:33,184 : (scapy.runtime) - No route found for IPv6 destination :: (no default route?). This affects only IPv6
[ERROR]  2017-12-03 20:36:33,261 : (root) - Could not import file trex.py
[ERROR]  2017-12-03 20:36:33,261 : (root) - Failed to run test: pvp_tput
Traceback (most recent call last):
  File "./vsperf", line 689, in main
    test.run()
  File "/root/vswitchperf/testcases/testcase.py", line 330, in run
    self.run_initialize()
  File "/root/vswitchperf/testcases/testcase.py", line 198, in run_initialize
    loader.get_trafficgen_class())
  File "/root/vswitchperf/core/loader/loader.py", line 80, in get_trafficgen_class
    return self._trafficgen_loader.get_class()
  File "/root/vswitchperf/core/loader/loader_servant.py", line 67, in get_class
    class_name=self._class_name)
  File "/root/vswitchperf/core/loader/loader_servant.py", line 119, in load_module
    path=path, interface=interface)
  File "/root/vswitchperf/core/loader/loader_servant.py", line 142, in load_modules
    for _, mod in LoaderServant._load_all_modules(path):
  File "/root/vswitchperf/core/loader/loader_servant.py", line 182, in _load_all_modules
    modname, *imp.find_module(modname, [root]))
  File "/root/vsperfenv/lib64/python3.3/imp.py", line 175, in load_module
    return load_source(name, filename, file)
  File "/root/vsperfenv/lib64/python3.3/imp.py", line 114, in load_source
    _LoadSourceCompatibility(name, pathname, file).load_module(name)
  File "<frozen importlib._bootstrap>", line 586, in _check_name_wrapper
  File "<frozen importlib._bootstrap>", line 1024, in load_module
  File "<frozen importlib._bootstrap>", line 1005, in load_module
  File "<frozen importlib._bootstrap>", line 562, in module_for_loader_wrapper
  File "<frozen importlib._bootstrap>", line 870, in _load_module
  File "<frozen importlib._bootstrap>", line 313, in _call_with_frames_removed
  File "/root/vswitchperf/tools/pkt_gen/trex/trex.py", line 32, in <module>
    from trex_stl_lib.api import *
  File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/api.py", line 4, in <module>
    from .trex_stl_client import STLClient, LoggerApi
  File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/trex_stl_client.py", line 14, in <module>
    from .trex_stl_vlan import VLAN
  File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/trex_stl_vlan.py", line 8, in <module>
    from scapy.layers.l2 import Dot1Q, Dot1AD
ImportError: cannot import name Dot1AD

So culprit is unordered os.walk over /root/vswitchperf/tools/pkt_gen directory that contains all traffic generators. On your setup Trex with its custom Scappy version seems to be loaded first.

ctrautma commented 6 years ago

Hi Vlad,

Thank you very much for the detailed work to find the culprit. As a temporary fix do you think commenting out the scapy imports in the Xena modules would work?

There are two imports. One is in tools/pkt_gen/xena/xena.py and tools/pkt_gen/xena/json/xena_json.py

I'll look for a more permanent fix.

ctrautma commented 6 years ago

I was able to reproduce your issue by doing a simple code block below in the python shell on my system. import zmq import sys import scapy sys.path.append("/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl") from trex_stl_lib.api import *

Traceback (most recent call last): File "", line 1, in File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/api.py", line 4, in from .trex_stl_client import STLClient, LoggerApi File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/trex_stl_client.py", line 14, in from .trex_stl_vlan import VLAN File "/root/vswitchperf/src/trex/trex/scripts/automation/trex_control_plane/stl/trex_stl_lib/trex_stl_vlan.py", line 8, in from scapy.layers.l2 import Dot1Q, Dot1AD ImportError: cannot import name Dot1AD

ctrautma commented 6 years ago

Hi Vlad,

I spoke with Martin Klozik at Intel about this issue. He is one of the if not the biggest contributor to the VSPerf project. He was very kind to offer some insight and how to resolve this long term.

  1. Modify the python environment to add a Dot1AD class
  2. Push up a patch to the scapy 3 implementation to fix the issue

I tested option one by doing the following

cp /root/vsperfenv/lib/python3.3/site-packages/scapy/layers/l2.py /root/vsperfenv/lib/python3.3/site-packages/scapy/layers/l2.orig modify /root/vsperfenv/lib/python3.3/site-packages/scapy/layers/l2.py

Add a Dot1AD class

-- ./vsperfenv/lib/python3.5/site-packages/scapy/layers/l2.py.orig 2017-12-13 09:10:23.057014656 +0000 +++ ./vsperfenv/lib/python3.5/site-packages/scapy/layers/l2.py 2017-12-13 09:19:22.054477997 +0000 @@ -238,6 +238,9 @@ conf.neighbor.register_l3(Ether, Dot1Q, lambda l2,l3: conf.neighbor.resolve(l2,l3.payload)) + class Dot1AD(Dot1Q): + pass + class STP(Packet): name = "Spanning Tree Protocol" fields_desc = [ ShortField("proto", 0),

This appears to work for me and resolves the issue so I cannot reproduce it any longer. Could you try this on your system and let me know if this fixes it? I'll put up a temp patch if so to change this. Long term I'll look to put up a pull request to the scapy3 repo to get this fixed permanently.

Thanks.