Closed xi-yang closed 1 year ago
@zlion A feature-sense_gcp_stitching-liang
branch has been created for this development. I think the SENSE-O GCPDriver is usable for this already. While you can continue fixing a few minor things there, you can start developing its fabfed integration.
Try use localhost deployment for the SENSE provider. First step is to install fabfed with necessary provider configs and exercise the SENSE AWS stitching workflow.
@xi-yang cc: @zlion
I understand sense (GCP) is a producer and fabric is a consumer. What information do we need on the fabric side? I mean is everything setup and do we follow the same convention as we did for AWS.
Recall for AWS , to create the facility port: cloud=AWS ====> name=Cloud_Facility_AWS and site=AWS device_name=agg4.ashb.net.internet2.edu local_name=HundredGigE0/0/0/7
And we also need the following for the peer labels the asn and the account_id. The bgp_key can be hardcoded. Thanks
@xi-yang cc @zlion
Hi Xi, Can you share that sense service profile (GCP) that Liang is using? Thanks. aes
Here is the profile I am using in my local orchestrator.
{
"data": {
"parent": "urn:ogf:network:google.com:gcp-cloud",
"gateways": [
{
"name": "Gateway 1",
"connects": [
{
"authkey": "0xzsEwC7xk6c1fK_h.xHyAdx",
"cloud_ip": "192.168.30.2/24",
"customer_ip": "192.168.30.1/24",
"customer_asn": "55038"
}
],
"type": "GCP Interconnect"
}
],
"cidr": "10.100.0.0/16",
"subnets": [
{
"vpn_route_propagation": true,
"name": "Subnet 1",
"cidr": "10.100.0.0/24",
"vms": [
{
"interfaces": [
{
"public": true,
"type": "Ethernet"
}
],
"name": "VM-1"
}
],
"internet_routable": true
}
]
},
"service": "vcn",
"options": [
"gcp-form"
]
}
Reached out to Paul for an signup issue with "https://beta-4.fabric-testbed.net".
@zlion I think @abessiari is asking for the service profile name on sense-o-dev. You should create one, test it and assign to him.
Also SENSE/GCP as a producer will need to pass the pairingKey to FABRIC. @zlion That is why I asked you to create an example config for the workflow and put that into the repository. Also, if the FABRIC AL2S AM has not supported input of paring-key yet, add that support and work with Komal to update the fablib interface.
Please stay on top of these issues as a developer for both the SENSE/GCP interconnect and the AL2S AM.
@abessiari Created the service profile named "GCP-INTERCONN" in sense-o-dev web portal. GCP driver is added as well.
@xi-yang The sense-o-dev is using older version and the service instance creation succeeded without running GCPinterconnectStitching code.
Reached out to Paul for an signup issue with "https://beta-4.fabric-testbed.net".
The process failed because Cilogon has new updates and broke the existing workflow. RENCI made a patch for me to get around that problem and get me complete the signup process. It is now pending for them to approve the signup.
@xi-yang The sense-o-dev is using older version and the service instance creation succeeded without running GCPinterconnectStitching code.
I will deploy the latest code to sense-o-dev on Friday. For now let's focus on fixing things on FABRIC end such as pairing key.
- @zlion I think @abessiari is asking for the service profile name on sense-o-dev. You should create one, test it and assign to him.
- Also SENSE/GCP as a producer will need to pass the pairingKey to FABRIC. @zlion That is why I asked you to create an example config for the workflow and put that into the repository. Also, if the FABRIC AL2S AM has not supported input of paring-key yet, add that support and work with Komal to update the fablib interface.
Please stay on top of these issues as a developer for both the SENSE/GCP interconnect and the AL2S AM.
Working on the AMHandlers to setup the AL2S connection for GCP
To summarize, the AMhandler code remains unchanged to support GCP, except for the placement of the pairing key in the "account_id" field in the GCP scenario.
Updates on Fabfed:
@xi-yang will generate a gcp-template.json file to get a manifest from SENSE for the stitching. pairing-key
will be the parameter will need for the stitching.
The template file has been merged into dev-gcp-stitch.
Branch feature-sense_gcp_stitching-liang
deleted.
Next work:
Next work:
- Work with Komal to add pairing-key label into AL2S sliver.
- Update and deploy the AL2S AM with support for the pairing-key
- Add pairing-key based stitching to the FABRIC provider in fabfed.
- Regression tests for both SENSE/AWS and SENSE/GCP workflows.
1&2: The OESS API reuses the field "account_id" for pairing-key in the GCP case. So there is no change in the AL2S AM code but passing the pairing-key as the "account_id" value when calling the handler.
2023-06-09 23:22:25,283 [fabric_slice.py:183] [INFO] Submitting request for slice test-gcp
2023-06-09 23:22:25,660 [slice.py:1905] [ERROR] Submit request error: return_status Status.FAILURE, slice_reservations: (415)
Reason: UNSUPPORTED MEDIA TYPE
HTTP response headers: HTTPHeaderDict({'Server': 'nginx/1.19.8', 'Date': 'Sat, 10 Jun 2023 04:22:25 GMT', 'Content-Type': 'application/problem+json', 'Content-Length': '151', 'Connection': 'keep-alive'})
HTTP response body: b'{\n "detail": "Invalid Content-type (text/plain), expected JSON data",\n "status": 415,\n "title": "Unsupported Media Type",\n "type": "about:blank"\n}\n'
2023-06-09 23:22:25,661 [controller.py:142] [ERROR] Submit request error: return_status Status.FAILURE, slice_reservations: (415)
Reason: UNSUPPORTED MEDIA TYPE
HTTP response headers: HTTPHeaderDict({'Server': 'nginx/1.19.8', 'Date': 'Sat, 10 Jun 2023 04:22:25 GMT', 'Content-Type': 'application/problem+json', 'Content-Length': '151', 'Connection': 'keep-alive'})
HTTP response body: b'{\n "detail": "Invalid Content-type (text/plain), expected JSON data",\n "status": 415,\n "title": "Unsupported Media Type",\n "type": "about:blank"\n}\n'
Traceback (most recent call last):
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/controller/controller.py", line 139, in create
provider.create_resource(resource=resource.attributes)
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/api/provider.py", line 160, in create_resource
raise e
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/api/provider.py", line 157, in create_resource
self.do_create_resource(resource=resource)
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/fabric/fabric_provider.py", line 54, in do_create_resource
self.slice.create_resource(resource=resource)
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/fabric/fabric_slice.py", line 281, in create_resource
self._submit_and_wait()
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/fabric/fabric_slice.py", line 198, in _submit_and_wait
raise e
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/fabric/fabric_slice.py", line 184, in _submit_and_wait
slice_id = self.slice_object.submit(wait=False)
File "/Users/lzhang9/opt/anaconda3/envs/fabfed/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py", line 1908, in submit
raise Exception(
Exception: Submit request error: return_status Status.FAILURE, slice_reservations: (415)
Reason: UNSUPPORTED MEDIA TYPE
HTTP response headers: HTTPHeaderDict({'Server': 'nginx/1.19.8', 'Date': 'Sat, 10 Jun 2023 04:22:25 GMT', 'Content-Type': 'application/problem+json', 'Content-Length': '151', 'Connection': 'keep-alive'})
HTTP response body: b'{\n "detail": "Invalid Content-type (text/plain), expected JSON data",\n "status": 415,\n "title": "Unsupported Media Type",\n "type": "about:blank"\n}\n'
Komal suggested to change the version fabrictestbed-extensions==1.5.0, and that helps the slice submission
2023-06-14 09:35:48,188 [slice.py:1896] [INFO] Submit request success: return_status Status.OK, slice_reservations
See an exception of the fabfed code.
023-06-14 09:36:19,867 [controller.py:142] [ERROR] Slice Exception: Slice Name: test-gcp, Slice ID: dfa95458-2f85-4d54-9971-39539d83f2e2: Slice Exception: Slice Name: test-gcp, Slice ID: dfa95458-2f85-4d54-9971-39539d83f2e2: list index out of range#
Slice Exception: Slice Name: test-gcp, Slice ID: dfa95458-2f85-4d54-9971-39539d83f2e2: Slice Exception: Slice Name: test-gcp, Slice ID: dfa95458-2f85-4d54-9971-39539d83f2e2: list index out of range#
Traceback (most recent call last):
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/controller/controller.py", line 139, in create
provider.create_resource(resource=resource.attributes)
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/api/provider.py", line 160, in create_resource
raise e
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/api/provider.py", line 157, in create_resource
self.do_create_resource(resource=resource)
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/fabric/fabric_provider.py", line 54, in do_create_resource
self.slice.create_resource(resource=resource)
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/fabric/fabric_slice.py", line 281, in create_resource
self._submit_and_wait()
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/fabric/fabric_slice.py", line 198, in _submit_and_wait
raise e
File "/Users/lzhang9/Projects/fabric-testbed/fabfed/fabfed/provider/fabric/fabric_slice.py", line 186, in _submit_and_wait
self.slice_object.wait(progress=True)
File "/Users/lzhang9/opt/anaconda3/envs/fabfed/lib/python3.9/site-packages/fabrictestbed_extensions/fablib/slice.py", line 1444, in wait
raise Exception(str(exception_string))
Exception: Slice Exception: Slice Name: test-gcp, Slice ID: dfa95458-2f85-4d54-9971-39539d83f2e2: Slice Exception: Slice Name: test-gcp, Slice ID: dfa95458-2f85-4d54-9971-39539d83f2e2: list index out of range#
Slice Exception: Slice Name: test-gcp, Slice ID: dfa95458-2f85-4d54-9971-39539d83f2e2: Slice Exception: Slice Name: test-gcp, Slice ID: dfa95458-2f85-4d54-9971-39539d83f2e2: list index out of range#
Komal reported as follows,
I attempted to create a slice by passing device_name='agg4.ashb.net.internet2.edu', region='us-east-4' and I see error from the AL2S AM handler.
failed lease update- all units failed priming: Exception during create for unit: 4e105d87-219d-420e-b880-327bb5faff93 (PlaybookException(Playbook has failed tasks: results: None, error_text: Invalid value for field region: . Must be a match of regex [a-z](?:[-a-z0-9]0,61[a-z0-9])? at /usr/share/perl5/vendor_perl/OESS/Cloud/GCP.pm line 391.n, error: 1), [(core1.loui.net.internet2.edu, HundredGigE0/0/0/24), (agg4.ashb.net.internet2.edu, Bundle-Ether110)])#all units failed priming: Exception during create for unit: 4e105d87-219d-420e-b880-327bb5faff93 (PlaybookException(Playbook has failed tasks: results: None, error_text: Invalid value for field region: . Must be a match of regex [a-z](?:[-a-z0-9]0,61[a-z0-9])? at /usr/share/perl5/vendor_perl/OESS/Cloud/GCP.pm line 391.n, error: 1), [(core1.loui.net.internet2.edu, HundredGigE0/0/0/24), (agg4.ashb.net.internet2.edu, Bundle-Ether110)])#
Looking into Komal's notebook "create_al2s.ipynb", I find it should contain "vlan" to get successful result.
labels=Labels(ipv4_subnet='192.168.30.1/24', device_name='agg4.ashb.net.internet2.edu', local_name='Bundle-Ether5', vlan='3'),
For aws, we did not need to pass a vlan or a region. They were optional since day 1. And even today I was able to provision successfully ...
According to OESS document (https://globalnoc.github.io/OESS/api/vrf), the VLAN tag is required parameter. Per Komal, the Fabric-CF automatically pick one if not present. But with the notebook "create_al2s.ipynb", I successfully provision the al2s with the VLAN parameter. I need to do some investigation into the code.
Yes in the GCP case, FABRIC is the consumer end. It will need a specific VLAN tag from SENSE / GCP. We will need to extract that out of the SENSE model using the manifest template.
According to OESS document (https://globalnoc.github.io/OESS/api/vrf), the VLAN tag is required parameter. Per Komal, the Fabric-CF automatically pick one if not present. But with the notebook "create_al2s.ipynb", I successfully provision the al2s with the VLAN parameter. I need to do some investigation into the code.
Komal verified that the VLAN is needed when calling Fabric-CF. She's checking the code for the reason.
Here is the latest update from Komal.
okay, i debugged this and found that it's not a CF bug, but specifically VLAN=2 seems to always fail. Automatic vlan allocation via CF code works for all other values of vlan. Should the vlan range in the AL2S.graphml be updated to not include vlan=2 in the range?
@xi-yang Could you update that AL2S.graphml accordingly?
With a second look at how the VLAN attachment works, the VLAN was not provided by GCP but automatically by the partner network config. In this case, request to AL2S sliver can just be any vlan (not specified) whatever picked by the FABRIC-CF will be accepted by the GCP. Only the pairing key matters.
So no change for the manifest template.
Yes in the GCP case, FABRIC is the consumer end. It will need a specific VLAN tag from SENSE / GCP. We will need to extract that out of the SENSE model using the manifest template.
According to OESS document (https://globalnoc.github.io/OESS/api/vrf), the VLAN tag is required parameter. Per Komal, the Fabric-CF automatically pick one if not present. But with the notebook "create_al2s.ipynb", I successfully provision the al2s with the VLAN parameter. I need to do some investigation into the code.
A
Here is the latest update from Komal.
okay, i debugged this and found that it's not a CF bug, but specifically VLAN=2 seems to always fail. Automatic vlan allocation via CF code works for all other values of vlan. Should the vlan range in the AL2S.graphml be updated to not include vlan=2 in the range?
@xi-yang Could you update that AL2S.graphml accordingly?
I manually removed VLAN 2 from all ports. This is a temporary solution. We need to investigate why VLAN 2 did not work as it is in the range I2 gave us.
@zlion When doing the manual editing, I noticed some weird VLAN range strings like ["1-4095", "2-2"] . Since you did the API part of the OESS scanner, can you take a look at that?
Some caveats in the testing is the configuration of Fabric slice, which take me long time to figure out.
@xi-yang
On the fabric node, we are not able to add a route to the GCP network. See the error message below.
[rocky@ad442f3f-a3c1-4c87-86e7-d5ae862e8f97-fabric-node0 ~]$ sudo ip route add 10.200.0.0/16 via 192.168.10.1
Error: Nexthop has invalid gateway.
We are also see the status in the AL2S diagram that the red arrow.
The data interface on the VM must be configured with an address before you can add a route.
ip addr add 192.168.10.2/24 dev eth3
ip route add 10.200.0.0/16 via 192.168.10.1
I can ping 192.168.10.1 but I cannot ping 10.200.1.2 if test-gcp-gcp-net
is the SENSE service instance for the stitched GCP resources.
@xi-yang @zlion FYI: We used dev eth1 instead before attempting to add the route.
ip addr add 192.168.10.2/24 dev eth3
@xi-yang @abessiari
It seems to me the route is not setup right along the path to GCP. Sent inquiry to Internet2 engineers to check if they can get more insights. Please also advise ways to debug this case.
[rocky@ad442f3f-a3c1-4c87-86e7-d5ae862e8f97-fabric-node0 ~]$ traceroute 10.200.1.2 traceroute to 10.200.1.2 (10.200.1.2), 30 hops max, 60 byte packets 1 192.168.10.1 (192.168.10.1) 0.816 ms !N
The workflow works. Some small issues will be tracked in #49
The SENSE provider in
develop
branch fully works for L2 DTN and AWS cases. With Liang finishing the GCPDriver refresh and support for Interconnect service, we will ad the GCP case to the provider.ETC: end of April