atlanticwave-sdx / sdx-controller

Central Controller for AtlanticWave SDX.
https://www.atlanticwave-sdx.net
MIT License
1 stars 3 forks source link

sdx_pce.temanager wrongly set the first ingress_port and last egress_port on some cases when creating connections #229

Closed italovalcy closed 3 months ago

italovalcy commented 6 months ago

When creating connections, the sdx_pce TEManager.generate_connection_breakdown() can wrongly set the first ingress_port and last egress_port for the generated VlanTaggedBreakdowns, as per:

First ingress_port: https://github.com/atlanticwave-sdx/pce/blob/main/src/sdx_pce/topology/temanager.py#L417

Last egress_port: https://github.com/atlanticwave-sdx/pce/blob/main/src/sdx_pce/topology/temanager.py#L422

By wrong ports, I mean ports that are different from the user's request. Thus, if the user requested, let's say, ingress_port=Domain1.NodeA.PortA and egress_port=DomainN.NodeZ.PortZ, the resulting breakdown connections will start at ingress_port=Domain1.NodeA.PortX and end on egress_port=DomainN.NodeZ.PortY, where PortX is the backbone port on NodeA which belongs to the resulting path on Domain1, as well as PortY is the backbone port on NodeZ which belongs to the resulting path on DomainN.

If we follow the code workflow, generate_connection_breakdown() only receives the ConnectionSolution. And the ConnectionSolution does not contain the original user-requested ingress/egress ports. Instead, the ConnectionSolution only contains the nodes where the user-requested ingress/egress ports were located. This can be seen as per:

https://github.com/atlanticwave-sdx/pce/blob/main/src/sdx_pce/topology/temanager.py#L201-L202

This leads to instantiating ConnectionRequest with the nodes only (not the original ingress/egress ports): https://github.com/atlanticwave-sdx/pce/blob/main/src/sdx_pce/topology/temanager.py#L245-L250

Later, the Breakdowns will take the first ingress and last egress from the ConnectionSolution, which does not guarantee it is the correct port according to the user request.

How to reproduce:

  1. Using sdx-controller installed from github main branch
  2. Load OXP's topology according to datamodel topology examples available on: https://github.com/atlanticwave-sdx/datamodel/tree/main/src/sdx_datamodel/data/topologies, however using an augmented version of amlight.json topology which includes a new port under node urn:sdx:node:amlight.net:A1:
                {
                    "id": "urn:sdx:port:amlight.net:A1:3",
                    "label_range": [
                        "100-200",
                        "1000"
                    ],
                    "name": "Novi100:3",
                    "node": "urn:sdx:node:amlight.net:A1",
                    "short_name": "A1:3",
                    "status": "up",
                    "state": "enabled"
                }
  3. Submit a request according to datamodel request examples, specifically https://github.com/atlanticwave-sdx/datamodel/blob/main/src/sdx_datamodel/data/requests/test_request.json however using ingress_port.id == urn:ogf:network:sdx:port:zaoxi:B2:1 and egress_port.id == urn:sdx:port:amlight.net:A1:3
  4. Check the resulting Breakdown

Expected result:

Actual result:

Relevant log entries:

sdx-controller    | INFO:sdx_pce.topology.temanager:generate_traffic_matrix, ports: ingress_port.id: urn:ogf:network:sdx:port:zaoxi:B2:1, egress_port.id: urn:sdx:port:amlight.net:A1:3

sdx-controller    | INFO:sdx_controller.handlers.connection_handler:Generated graph: 'Graph with 10 nodes and 14 edges', traffic matrix: 'TrafficMatrix(connection_requests=[ConnectionRequest(source=8, destination=2, required_bandwidth=10, required_latency=300)], request_id='test-connection-request')'

sdx-controller    | INFO:sdx_pce.load_balancing.te_solver:solution_translator result: ConnectionSolution(connection_map={ConnectionRequest(source=8, destination=2, required_bandwidth=10, required_latency=300): [ConnectionPath(source=8, destination=7), ConnectionPath(source=7, destination=5), ConnectionPath(source=5, destination=3), ConnectionPath(source=3, destination=0), ConnectionPath(source=0, destination=2)]}, cost=5.0, request_id='test-connection-request')

sdx-controller    | INFO:sdx_pce.topology.temanager:generate_connection_breakdown(): tagged_breakdown: VlanTaggedBreakdowns(breakdowns={'urn:ogf:network:sdx:topology:zaoxi.net': VlanTaggedBreakdown(name='ZAOXI_vlan_10001_10001', dynamic_backup_path=True, uni_a=VlanTaggedPort(tag=VlanTag(value=10001, tag_type=1), port_id='urn:ogf:network:sdx:port:zaoxi:B2:2'), uni_z=VlanTaggedPort(tag=VlanTag(value=10001, tag_type=1), port_id='urn:ogf:network:sdx:port:zaoxi:B1:1')), 'urn:ogf:network:sdx:topology:sax.net': VlanTaggedBreakdown(name='SAX_vlan_1000_10001', dynamic_backup_path=True, uni_a=VlanTaggedPort(tag=VlanTag(value=1000, tag_type=1), port_id='urn:ogf:network:sdx:port:sax:B3:1'), uni_z=VlanTaggedPort(tag=VlanTag(value=10001, tag_type=1), port_id='urn:ogf:network:sdx:port:sax:B1:1')), 'urn:ogf:network:sdx:topology:amlight.net': VlanTaggedBreakdown(name='AMLIGHT_vlan_10001_1000', dynamic_backup_path=True, uni_a=VlanTaggedPort(tag=VlanTag(value=10001, tag_type=1), port_id='urn:sdx:port:amlight:B1:1'), uni_z=VlanTaggedPort(tag=VlanTag(value=1000, tag_type=1), port_id='urn:sdx:port:amlight.net:A1:1'))})

sdx-controller    | INFO:sdx_controller.handlers.connection_handler:-- BREAKDOWN: {"urn:ogf:network:sdx:topology:zaoxi.net": {"name": "ZAOXI_vlan_10001_10001", "dynamic_backup_path": true, "uni_a": {"tag": {"value": 10001, "tag_type": 1}, "port_id": "urn:ogf:network:sdx:port:zaoxi:B2:2"}, "uni_z": {"tag": {"value": 10001, "tag_type": 1}, "port_id": "urn:ogf:network:sdx:port:zaoxi:B1:1"}}, "urn:ogf:network:sdx:topology:sax.net": {"name": "SAX_vlan_1000_10001", "dynamic_backup_path": true, "uni_a": {"tag": {"value": 1000, "tag_type": 1}, "port_id": "urn:ogf:network:sdx:port:sax:B3:1"}, "uni_z": {"tag": {"value": 10001, "tag_type": 1}, "port_id": "urn:ogf:network:sdx:port:sax:B1:1"}}, "urn:ogf:network:sdx:topology:amlight.net": {"name": "AMLIGHT_vlan_10001_1000", "dynamic_backup_path": true, "uni_a": {"tag": {"value": 10001, "tag_type": 1}, "port_id": "urn:sdx:port:amlight:B1:1"}, "uni_z": {"tag": {"value": 1000, "tag_type": 1}, "port_id": "urn:sdx:port:amlight.net:A1:1"}}}
italovalcy commented 6 months ago

One way of fixing this would be the following approach:

--- a/sdx_controller/handlers/connection_handler.py
+++ b/sdx_controller/handlers/connection_handler.py
@@ -155,8 +155,8 @@ class ConnectionHandler:
         if solution is None or solution.connection_map is None:
             return "Could not solve the request", 400

-        breakdown = temanager.generate_connection_breakdown(solution)
+        breakdown = temanager.generate_connection_breakdown(solution, connection_request)
         #self._send_breakdown_to_lc(breakdown, connection_request)
         return self._send_breakdown_to_lc(breakdown, connection_request)

     def handle_link_failure(self, msg_json):

This approach will keep the ConnectionRequest and ConnectionSolution as they are, while providing more info for generate_connection_breakdown() to do the actual correct attribution of first ingress_port and last egress_port.

Of course, ideally topology model should provide a get_port_by_id instead of doing that on temanager.

italovalcy commented 3 months ago

Hi @congwang09 and @YufengXin just tested with the most recent version of SDX-Controller and still getting the wrong ingress/egress ports being created on the OXPs. Please find below the steps I'm following:

1) Create the environment for testing:

git clone https://github.com/atlanticwave-sdx/sdx-continuous-development
cd sdx-continuous-development
git checkout main
git pull
for repo in sdx-lc sdx-controller kytos-sdx-topology; do cd data-plane/; git clone https://github.com/atlanticwave-sdx/$repo container-$repo; cd ..; done
cd data-plane
cp template.env .env
./1_build_kytos.sh
./2_build_oxpos.sh
./3_build_local_controllers.sh
./4_build_mongo.sh
cd container-sdx-controller
docker build -t sdx-controller .
cd ..
docker compose up -d

# wait a couple of secunds to make the OXPs boot up. Usually I follow the steps below to make sure they are all UP:
EXP_SW=3; while true; do N_SW=$(curl -s http://0.0.0.0:8181/api/kytos/topology/v3/switches | jq -r '.switches[].id' | wc -l); echo "waiting switches $N_SW / $EXP_SW"; if [ $N_SW -eq $EXP_SW ]; then break; fi; sleep 1; done
EXP_SW=2; while true; do N_SW=$(curl -s http://0.0.0.0:8282/api/kytos/topology/v3/switches | jq -r '.switches[].id' | wc -l); echo "waiting switches $N_SW / $EXP_SW"; if [ $N_SW -eq $EXP_SW ]; then break; fi; sleep 1; done
EXP_SW=3; while true; do N_SW=$(curl -s http://0.0.0.0:8383/api/kytos/topology/v3/switches | jq -r '.switches[].id' | wc -l); echo "waiting switches $N_SW / $EXP_SW"; if [ $N_SW -eq $EXP_SW ]; then break; fi; sleep 1; done

# configure OXPs -- kytos requires some initial configs like approving switches and links just discovered
./container-kytos-sdx-topology/curl/2.enable_all.sh
./container-kytos-sdx-topology/curl/0a.version_control.sh
./container-kytos-sdx-topology/curl/0b.version_control.sh
./container-kytos-sdx-topology/curl/0c.version_control.sh

# check if the topology is correctly identified by SDX-Controller:
curl -s http://0.0.0.0:8080/SDX-Controller/1.0.0/topology | jq -r '.nodes[] | (.ports[] | .id)'
curl -s http://0.0.0.0:8080/SDX-Controller/1.0.0/topology | jq -r '.links[] | .id + " " + .ports[0].id + " " + .ports[1].id'

2) submit a connection request

curl -X POST http://0.0.0.0:8080/SDX-Controller/1.0.0/connection -H 'Content-Type: application/json' -d '{"id": "3", "name": "Test connection request 22", "start_time": "2000-01-23T04:56:07.000Z", "end_time": "2000-01-23T04:56:07.000Z", "bandwidth_required": 10, "latency_required": 300, "egress_port": {"id": "urn:sdx:port:tenet.ac.za:Tenet03:50", "name": "Tenet03:50", "node": "urn:sdx:port:tenet.ac.za:Tenet03", "status": "up"}, "ingress_port": {"id": "urn:sdx:port:ampath.net:Ampath3:50", "name": "Ampath3:50", "node": "urn:sdx:port:ampath.net:Ampath3", "status": "up"}}'

Expected result:

Actual result:

Logs from sdx-controller:

INFO:sdx_pce.topology.temanager:generate_connection_breakdown(): tagged_breakdown: VlanTaggedBreakdowns(breakdowns={'urn:sdx:topology:ampath.net': VlanTaggedBreakdown(name='AMPATH_vlan_100_100', dynamic_backup_path=True, uni_a=VlanTaggedPort(tag=VlanTag(value=100, tag_type=1), port_id='urn:sdx:port:ampath.net:Ampath3:2'), uni_z=VlanTaggedPort(tag=VlanTag(value=100, tag_type=1), port_id='urn:sdx:port:ampath.net:Ampath1:40')), 'urn:sdx:topology:sax.net': VlanTaggedBreakdown(name='SAX_vlan_100_100', dynamic_backup_path=True, uni_a=VlanTaggedPort(tag=VlanTag(value=100, tag_type=1), port_id='urn:sdx:port:sax.net:Sax01:40'), uni_z=VlanTaggedPort(tag=VlanTag(value=100, tag_type=1), port_id='urn:sdx:port:sax.net:Sax01:41')), 'urn:sdx:topology:tenet.ac.za': VlanTaggedBreakdown(name='TENET_vlan_100_100', dynamic_backup_path=True, uni_a=VlanTaggedPort(tag=VlanTag(value=100, tag_type=1), port_id='urn:sdx:port:tenet.ac.za:Tenet01:41'), uni_z=VlanTaggedPort(tag=VlanTag(value=100, tag_type=1), port_id='urn:sdx:port:tenet.ac.za:Tenet03:2'))})

If you observe the output above the first breakdown was created on Ampath.net OXP from port urn:sdx:port:ampath.net:Ampath3:2 (wrong!) to port urn:sdx:port:ampath.net:Ampath1:40 (correct, it is the inter-domain port for this OXP).

Same thing happens on the last breakdown: if you observe the output above the last breakdown was created on Tenet.ac.za OXP from port urn:sdx:port:tenet.ac.za:Tenet01:41 (correct, it is the inter-domain port for this OXP) to port urn:sdx:port:tenet.ac.za:Tenet03:2 (wrong!!!).

The suggestion I gave on comment above seems to fix this issue, however it is a non-optimal solution.

Please let me know if you need any additional information.

YufengXin commented 3 months ago

@italovalcy just want to give you an update that I'm on this issue with the PR in PCE. I needed to first augment our mocked topology file Amlight.json in datamodel with the user-facing ports, finished up all the unittest, then moved to pce for more testing before handing to you for the real test.

Thanks for this test case and traced the code.