fabric-testbed / fabfed

FABRIC Tool-based Federation Kit for a Testbed of Testbeds
MIT License
2 stars 0 forks source link

Cloudlab example is recreating cloudlab resources on every -apply #129

Closed disprosium8 closed 4 months ago

disprosium8 commented 4 months ago

Branch: develop Example: examples/cloudlab

Used the existing Cloudlab example config in the branch and the initial workflow completed without errors. When I run subsequent -apply steps after adding the Janus service, FabFed deletes only the Cloudlab resources and then recreates them. Everything goes back to working but there is this unneeded recreation of the Cloudlab resources.


- !ProviderState
  label: cloudlab_provider@cloudlab
  attributes:
    name: ezra-cl-test
  network_states:
  - !NetworkState
    label: cnet@network
    attributes:
      name: ezra-cl-test-cne
      site: utah
      profile: fabfed-stitch-v2
      interface:
      - id: ''
        provider: cloudlab
        vlan: '3100'
      layer3: !Config
        type: layer3
        name: my_layer-0
        attrs:
          subnet: 192.168.1.0/24
          gateway: 192.168.1.1
          ip_start: 192.168.1.2
          ip_end: 192.168.1.128
      cluster: urn:publicid:IDN+utah.cloudlab.us+authority+cm
      stich_node_ip: 192.168.1.1
      stich_site: utah
  node_states:
  - !NodeState
    label: cnode@node
    attributes:
      name: ezra-cl-test-cnode-0
      user: ezra
      host: 128.110.217.94
      keyfile: /home/ezra/.ssh/id_rsa
      jump_user: null
      jump_host: null
      jump_keyfile: null
      image: ''
      site: utah
      flavor: ''
      mgmt_ip: 128.110.217.94
      dataplane_ipv4: 192.168.1.1
      dataplane_ipv6: null
  service_states: []
  pending: []
  pending_internal: []
  failed: {}
  creation_details:
    cnet@network:
      resources:
      - ezra-cl-test-cne
      config:
        layer3: &id001 !Config
          type: layer3
          name: my_layer
          attrs:
            subnet: 192.168.1.0/24
            gateway: 192.168.1.1
            ip_start: 192.168.1.2
            ip_end: 192.168.1.254
        stitch_info: &id002
          stitch_port:
            name: Cloudlab-Utah
            profile: Utah-Cloudlab-Powder
            provider: fabric
            device_name: Utah-Cloudlab-Powder
            site: UTAH
            peer:
              profile: fabfed-stitch-v2
              provider: cloudlab
              option:
                cluster: urn:publicid:IDN+utah.cloudlab.us+authority+cm
          producer: cloudlab
          consumer: fabric
        site: null
        count: 1
        name_prefix: cnet
      total_count: 1
      failed_count: 0
      created_count: 1
      name_prefix: cnet
    cnode@node:
      resources:
      - ezra-cl-test-cnode-0
      config:
        network:
          dependency_info:
            resource: cnet@network
            attribute: ''
        count: 1
        site: null
        name_prefix: cnode
      total_count: 1
      failed_count: 0
      created_count: 1
      name_prefix: cnode
- !ProviderState
  label: fabric_provider@fabric
  attributes:
    name: ezra-cl-test
  network_states:
  - !NetworkState
    label: fabric_network@network
    attributes:
      name: ezra-cl-test-fabric_network-0
      site: UTAH
      slice_name: ezra-cl-test
      type: L2Bridge
      layer3: !Config
        type: layer3
        name: my_layer-1
        attrs:
          subnet: 192.168.1.0/24
          gateway: 192.168.1.1
          ip_start: 192.168.1.129
          ip_end: 192.168.1.254
      peering: null
      peer_layer3: []
      interface:
      - id: Utah-Cloudlab-Powder-Utah-Cloudlab-Powder-int
        vlan: '3100'
      - id: ezra-cl-test-fabric_node-0-ezra-cl-test-fabric_node-0-stitch_net_iface-p1
        vlan: '2002'
      id: b32b9931-4936-4ebe-ad65-44ec293f2e96
      state: Active
  node_states:
  - !NodeState
    label: fabric_node@node
    attributes:
      name: ezra-cl-test-fabric_node-0
      user: rocky
      host: 2001:1948:417:7:f816:3eff:fe27:dc67
      keyfile: /home/ezra/.ssh/ezra-sliver
      jump_user: kissel_0016967042
      jump_host: bastion-1.fabric-testbed.net
      jump_keyfile: /home/ezra/.ssh/ezra-bastion
      image: default_rocky_8
      site: UTAH
      flavor: '{''cores'': 2, ''ram'': 8, ''disk'': 10}'
      mgmt_ip: 2001:1948:417:7:f816:3eff:fe27:dc67
      nic_model: null
      slice_name: ezra-cl-test
      network_label: fabric_network@network
      username: rocky
      state: active
      dataplane_ipv4: 192.168.1.130
      dataplane_ipv6: fe80::22c5:3b58:f501:278a
      id: d8e45ec9-2c8e-4a54-ad24-121992a11b42
      components:
      - name: ezra-cl-test-fabric_node-0-stitch_net_iface
        model: NIC_Basic
      addr_list:
        lo:
        - 127.0.0.1
        - ::1
        eth0:
        - 10.30.6.41
        - 2001:1948:417:7:f816:3eff:fe27:dc67
        - fe80::f816:3eff:fe27:dc67
        eth1:
        - 192.168.1.130
        - fe80::22c5:3b58:f501:278a
  service_states: []
  pending: []
  pending_internal: []
  failed: {}
  creation_details:
    fabric_node@node:
      resources:
      - ezra-cl-test-fabric_node-0
      config:
        site: UTAH
        image: default_rocky_8
        nic_model: NIC_Basic
        count: 1
        name_prefix: fabric_node
      total_count: 1
      failed_count: 0
      created_count: 1
      name_prefix: fabric_node
    fabric_network@network:
      resources:
      - ezra-cl-test-fabric_network-0
      config:
        layer3: *id001
        interface:
          dependency_info:
            resource: fabric_node@node
            attribute: ''
        stitch_interface:
          dependency_info:
            resource: cnet@network
            attribute: ''
        stitch_info: *id002
        site: UTAH
        count: 1
        name_prefix: fabric_network
      total_count: 1
      failed_count: 0
      created_count: 1
      name_prefix: fabric_network
- !ProviderState
  label: janus_provider@janus
  attributes:
    name: ezra-cl-test
  network_states: []
  node_states: []
  service_states:
  - !ServiceState
    label: dtn_service@service
    attributes:
      name: ezra-cl-test-dtn_service
      image: dtnaas/tools
      created: true
      controller_url: https://192.168.1.130:5000
      controller_host: 2001:1948:417:7:f816:3eff:fe27:dc67
      controller_web: http://localhost:8000
      controller_ssh_tunnel_cmd: ssh rocky@2001:1948:417:7:f816:3eff:fe27:dc67 -J kissel_0016967042@bastion-1.fabric-testbed.net -L 8000:localhost:8000
  pending: []
  pending_internal: []
  failed: {}
  creation_details:
    dtn_service@service:
      resources:
      - ezra-cl-test-dtn_service
      config:
        node:
        - dependency_info:
            resource: cnode@node
            attribute: ''
        - dependency_info:
            resource: fabric_node@node
            attribute: ''
        controller:
          dependency_info:
            resource: fabric_node@node
            attribute: ''
        image: dtnaas/tools
        profile: fabfed
        count: 1
        name_prefix: dtn_service
      total_count: 1
      failed_count: 0
      created_count: 1
      name_prefix: dtn_service
(fabfed) ezra@ezra-dev:~/repos/fabfed/examples/cloudlab$ 
(fabfed) ezra@ezra-dev:~/repos/fabfed/examples/cloudlab$ fabfed workflow -s ezra-cl-test -apply
2024-05-03 15:38:48,220 [controller.py:67] [INFO] loaded local stitching policy.
2024-05-03 15:38:48,221 [policy_helper.py:304] [INFO] Found 3 stitch ports
2024-05-03 15:38:48,221 [policy_helper.py:361] [INFO] Using stitch port based on site=UTAH and providers=['fabric', 'cloudlab']:DetailedStitchInfo(stitch_port={'name': 'Cloudlab-Utah', 'profile': 'Utah-Cloudlab-Powder', 'preference': 200, 'member-of': ['CLOUDLAB'], 'provider': 'fabric', 'device_name': 'Utah-Cloudlab-Powder', 'site': 'UTAH', 'peer': {'name': 'Cloudlab-Utah', 'profile': 'fabfed-stitch-v2', 'member-of': ['CLOUDLAB'], 'provider': 'cloudlab', 'preference': 0, 'option': {'cluster': 'urn:publicid:IDN+utah.cloudlab.us+authority+cm'}}}, producer='cloudlab', consumer='fabric', producer_group={'name': 'CLOUDLAB', 'producer-for': ['fabric'], 'provider': 'cloudlab', 'consumer-for': []}, consumer_group={'name': 'CLOUDLAB', 'consumer-for': ['cloudlab'], 'provider': 'fabric', 'producer-for': []})
2024-05-03 15:38:48,221 [controller.py:154] [INFO] cnet@network: stitch_info=StitchInfo(stitch_port={'name': 'Cloudlab-Utah', 'profile': 'Utah-Cloudlab-Powder', 'provider': 'fabric', 'device_name': 'Utah-Cloudlab-Powder', 'site': 'UTAH', 'peer': {'profile': 'fabfed-stitch-v2', 'provider': 'cloudlab', 'option': {'cluster': 'urn:publicid:IDN+utah.cloudlab.us+authority+cm'}}}, producer='cloudlab', consumer='fabric')
2024-05-03 15:38:48,221 [controller.py:154] [INFO] fabric_network@network: stitch_info=StitchInfo(stitch_port={'name': 'Cloudlab-Utah', 'profile': 'Utah-Cloudlab-Powder', 'provider': 'fabric', 'device_name': 'Utah-Cloudlab-Powder', 'site': 'UTAH', 'peer': {'profile': 'fabfed-stitch-v2', 'provider': 'cloudlab', 'option': {'cluster': 'urn:publicid:IDN+utah.cloudlab.us+authority+cm'}}}, producer='cloudlab', consumer='fabric')
2024-05-03 15:38:48,221 [controller.py:159] [INFO] Starting PLAN_PHASE for 5 resource(s)
2024-05-03 15:38:48,221 [controller.py:228] [INFO] Starting ADD_PHASE: Calling ADD ... for 5 resource(s)
2024-05-03 15:38:48,222 [fabric_provider.py:89] [INFO] Initializing slice ezra-cl-test
2024-05-03 15:38:50,605 [slice.py:449] [INFO] slice.get_slice()
2024-05-03 15:38:50,606 [slice.py:601] [INFO] update_topology: ezra-cl-test, count: 1
2024-05-03 15:38:50,747 [fabric_slice_helper.py:129] [INFO] Found slice ezra-cl-test:state=StableOK
2024-05-03 15:38:50,751 [fabric_provider.py:116] [INFO] Done initializing slice ezra-cl-test
2024-05-03 15:38:50,751 [fabric_provider.py:125] [INFO] Initialized slice ezra-cl-test
2024-05-03 15:38:50,753 [fabric_node.py:17] [INFO]  Node ezra-cl-test-fabric_node-0 construtor called ... 
2024-05-03 15:38:50,755 [provider.py:246] [INFO] Adding fabric_network@network to pending using fabric_provider@fabric
2024-05-03 15:38:50,755 [provider.py:251] [INFO] Handling internal dependencies cnode@node using provider cloudlab_provider@cloudlab
2024-05-03 15:38:50,756 [dependency_reslover.py:70] [INFO] Resolving: Dependency(key='network', resource=cnet@network, attribute='', is_external=False) for cnode@node: value=<fabfed.provider.cloudlab.cloudlab_network.CloudNetwork object at 0x7f7a0d41ef50> using cloudlab_provider@cloudlab
2024-05-03 15:38:50,756 [dependency_reslover.py:100] [INFO] Resolved dependency Dependency(key='network', resource=cnet@network, attribute='', is_external=False) for cnode@node using cloudlab_provider@cloudlab
2024-05-03 15:38:50,756 [dependency_reslover.py:21] [INFO] Checking if all dependencies are resolved for cnode@node using cloudlab_provider@cloudlab
2024-05-03 15:38:50,756 [dependency_reslover.py:41] [INFO] Checking if all dependencies are resolved for cnode@node using cloudlab_provider@cloudlab:ret=True
2024-05-03 15:38:50,756 [dependency_reslover.py:126] [INFO] Extracted Values: [(<fabfed.provider.cloudlab.cloudlab_network.CloudNetwork object at 0x7f7a0d41ef50>,)]:cnode@node:network using cloudlab_provider@cloudlab
2024-05-03 15:38:50,756 [provider.py:246] [INFO] Adding dtn_service@service to pending using janus_provider@janus
2024-05-03 15:38:50,756 [controller.py:264] [INFO] Starting APPLY_PHASE for 5 resource(s)
2024-05-03 15:38:50,756 [provider.py:319] [INFO] Creating cnet@network using cloudlab_provider@cloudlab: ['cnet@network', 'cnode@node']
2024-05-03 15:38:50,756 [cloudlab_provider.py:205] [INFO] Deleting cloudlab resources ....
2024-05-03 15:39:03,971 [cloudlab_network.py:179] [INFO] Still waiting for experiment to be terminated
2024-05-03 15:39:12,625 [cloudlab_network.py:179] [INFO] Still waiting for experiment to be terminated
2024-05-03 15:39:21,276 [cloudlab_network.py:179] [INFO] Still waiting for experiment to be terminated
2024-05-03 15:39:29,905 [cloudlab_network.py:179] [INFO] Still waiting for experiment to be terminated
2024-05-03 15:39:38,595 [cloudlab_network.py:179] [INFO] Still waiting for experiment to be terminated
2024-05-03 15:39:47,255 [cloudlab_network.py:179] [INFO] Still waiting for experiment to be terminated
2024-05-03 15:39:55,931 [cloudlab_provider.py:218] [INFO] Done deleting cloudlab resources ....
2024-05-03 15:39:59,852 [cloudlab_network.py:71] [INFO] Network ezra-cl-test-cne not found, creating... {'profile': 'fabfed,fabfed-stitch-v2', 'proj': 'fabfed', 'name': 'ezra-cl-test-cne', 'asjson': True, 'bindings': '{"cluster": "urn:publicid:IDN+utah.cloudlab.us+authority+cm", "node_count": "1", "ip_subnet": "192.168.1.0/24"}'}
abessiari commented 4 months ago

@disprosium8

I plan to look inot this and fix it this today.

abessiari commented 4 months ago

@disprosium8

So far I have not seen. BTW the -plan tells what is supposed to happen. In this case I changed the config to add the janus service. ...

Keep in mind that cloudlab does not handle modify. And so if you change the number of cloudlab nodes the -apply would destroy cloudlab resources and reprovision ....

The only providers that handle modify are chameleon and fabric.

(fabfed-dev-beta-1.5.0) cloudlab$fabfed workflow -s clab -plan -summary
summaries:
- label: cnet@network
  attributes:
    to_be_created: 0
    to_be_deleted: 0
- label: fabric_network@network
  attributes:
    to_be_created: 0
    to_be_deleted: 0
- label: fabric_node@node
  attributes:
    to_be_created: 0
    to_be_deleted: 0
- label: cnode@node
  attributes:
    to_be_created: 0
    to_be_deleted: 0
- label: dtn_service@service
  attributes:
    to_be_created: 1
    to_be_deleted: 0
2024-05-06 17:45:23,752 [fabfed.py:219] [WARNING] Applying this plan would create 1 resource(s) and destroy 0 resource(s)
abessiari commented 4 months ago

@disprosium8

It actually worked as "advertised". :-)

  1. I started out with just the fabric and clab networks. No Nodes. -plan + -apply + -apply
  2. Then added the clab node. -plan + -apply (This will reprovision the clab slice) + -apply
  3. Then added the fabric node. -plan + -apply + -apply (clab did not reprovision)
  4. Then added the janus service -plan + -apply + -apply ((clab did not reprovision))
abessiari commented 4 months ago

The previous was done with the cloudlab under jupyter/examples. Now using the config under examples directory ....

summaries:
- label: cnet@network
  attributes:
    to_be_created: 1
    to_be_deleted: 0
- label: fabric_node@node
  attributes:
    to_be_created: 1
    to_be_deleted: 0
- label: fabric_network@network
  attributes:
    to_be_created: 1
    to_be_deleted: 0
- label: cnode@node
  attributes:
    to_be_created: 1
    to_be_deleted: 0
2024-05-06 18:13:50,052 [fabfed.py:219] [WARNING] Applying this plan would create 4 resource(s) and destroy 0 resource(s)
disprosium8 commented 4 months ago

It seems to be an issue with the length of the session name and truncation in the provider?

Try a longer session name. Compare:

2024-05-06 22:15:02,362 [controller.py:228] [INFO] Starting ADD_PHASE: Calling ADD ... for 4 resource(s)
2024-05-06 22:15:02,362 [fabric_provider.py:89] [INFO] Initializing slice ezra-cl-test
2024-05-06 22:15:16,116 [fabric_slice_helper.py:133] [INFO] Created fresh slice ezra-cl-test:state=None
2024-05-06 22:15:16,117 [fabric_provider.py:116] [INFO] Done initializing slice ezra-cl-test
2024-05-06 22:15:16,117 [fabric_provider.py:125] [INFO] Initialized slice ezra-cl-test
2024-05-06 22:15:16,117 [node.py:193] [INFO] Adding node: ezra-cl-test-fabric_node-0, slice: ezra-cl-test, site: UTAH
2024-05-06 22:15:16,119 [fabric_node.py:17] [INFO]  Node ezra-cl-test-fabric_node-0 construtor called ... 
2024-05-06 22:15:16,120 [provider.py:246] [INFO] Adding fabric_network@network to pending using fabric_provider@fabric
2024-05-06 22:15:16,120 [provider.py:251] [INFO] Handling internal dependencies cnode@node using provider cloudlab_provider@cloudlab
2024-05-06 22:15:16,120 [dependency_reslover.py:70] [INFO] Resolving: Dependency(key='network', resource=cnet@network, attribute='', is_external=False) for cnode@node: value=<fabfed.provider.cloudlab.cloudlab_network.CloudNetwork object at 0x7fd98fabba90> using cloudlab_provider@cloudlab
2024-05-06 22:15:16,120 [dependency_reslover.py:100] [INFO] Resolved dependency Dependency(key='network', resource=cnet@network, attribute='', is_external=False) for cnode@node using cloudlab_provider@cloudlab
2024-05-06 22:15:16,120 [dependency_reslover.py:21] [INFO] Checking if all dependencies are resolved for cnode@node using cloudlab_provider@cloudlab
2024-05-06 22:15:16,120 [dependency_reslover.py:41] [INFO] Checking if all dependencies are resolved for cnode@node using cloudlab_provider@cloudlab:ret=True
2024-05-06 22:15:16,120 [dependency_reslover.py:126] [INFO] Extracted Values: [(<fabfed.provider.cloudlab.cloudlab_network.CloudNetwork object at 0x7fd98fabba90>,)]:cnode@node:network using cloudlab_provider@cloudlab
2024-05-06 22:15:16,121 [controller.py:264] [INFO] Starting APPLY_PHASE for 4 resource(s)
2024-05-06 22:15:16,121 [provider.py:319] [INFO] Creating cnet@network using cloudlab_provider@cloudlab: ['cnet@network', 'cnode@node']
2024-05-06 22:15:16,121 [cloudlab_provider.py:205] [INFO] Deleting cloudlab resources ....
2024-05-06 22:15:20,073 [cloudlab_provider.py:218] [INFO] Done deleting cloudlab resources ....
2024-05-06 22:15:23,981 [cloudlab_network.py:71] [INFO] Network ezra-cl-test-cne not found, creating... {'profile': 'fabfed,fabfed-stitch-v2', 'proj': 'fabfed', 'name': 'ezra-cl-test-cne', 'asjson': True, 'bindings': '{"vlan": "3400", "cluster": "urn:publicid:IDN+utah.cloudlab.us+authority+cm", "node_count": "1", "ip_subnet": "192.168.1.0/24"}'}

with

2024-05-06 22:16:08,095 [controller.py:159] [INFO] Starting PLAN_PHASE for 4 resource(s)
2024-05-06 22:16:08,095 [controller.py:228] [INFO] Starting ADD_PHASE: Calling ADD ... for 4 resource(s)
2024-05-06 22:16:08,096 [fabric_provider.py:89] [INFO] Initializing slice clab2
2024-05-06 22:16:10,360 [fabric_slice_helper.py:133] [INFO] Created fresh slice clab2:state=None
2024-05-06 22:16:10,361 [fabric_provider.py:116] [INFO] Done initializing slice clab2
2024-05-06 22:16:10,361 [fabric_provider.py:125] [INFO] Initialized slice clab2
2024-05-06 22:16:10,361 [node.py:193] [INFO] Adding node: clab2-fabric_node-0, slice: clab2, site: UTAH
2024-05-06 22:16:10,364 [fabric_node.py:17] [INFO]  Node clab2-fabric_node-0 construtor called ... 
2024-05-06 22:16:10,364 [provider.py:246] [INFO] Adding fabric_network@network to pending using fabric_provider@fabric
2024-05-06 22:16:10,364 [provider.py:251] [INFO] Handling internal dependencies cnode@node using provider cloudlab_provider@cloudlab
2024-05-06 22:16:10,365 [dependency_reslover.py:70] [INFO] Resolving: Dependency(key='network', resource=cnet@network, attribute='', is_external=False) for cnode@node: value=<fabfed.provider.cloudlab.cloudlab_network.CloudNetwork object at 0x7f8d5db73c50> using cloudlab_provider@cloudlab
2024-05-06 22:16:10,365 [dependency_reslover.py:100] [INFO] Resolved dependency Dependency(key='network', resource=cnet@network, attribute='', is_external=False) for cnode@node using cloudlab_provider@cloudlab
2024-05-06 22:16:10,365 [dependency_reslover.py:21] [INFO] Checking if all dependencies are resolved for cnode@node using cloudlab_provider@cloudlab
2024-05-06 22:16:10,365 [dependency_reslover.py:41] [INFO] Checking if all dependencies are resolved for cnode@node using cloudlab_provider@cloudlab:ret=True
2024-05-06 22:16:10,365 [dependency_reslover.py:126] [INFO] Extracted Values: [(<fabfed.provider.cloudlab.cloudlab_network.CloudNetwork object at 0x7f8d5db73c50>,)]:cnode@node:network using cloudlab_provider@cloudlab
2024-05-06 22:16:10,365 [controller.py:264] [INFO] Starting APPLY_PHASE for 4 resource(s)
2024-05-06 22:16:10,365 [provider.py:319] [INFO] Creating cnet@network using cloudlab_provider@cloudlab: ['cnet@network', 'cnode@node']
2024-05-06 22:16:14,335 [cloudlab_network.py:78] [INFO] Network already exists, checking status
abessiari commented 4 months ago

@disprosium8

I see. This is just one corner case caused by truncating silently. I think we should just throw an exception telling the user that clab cannot handle long session names .... What do you think?

disprosium8 commented 4 months ago

An exception works or create a mapping between a generated cloudlab experiment name and the session name. The important thing is that the user doesn't get unexpected behavior.

abessiari commented 4 months ago

Fixed.