ansible-collections / community.vmware

Ansible Collection for VMware
GNU General Public License v3.0
352 stars 336 forks source link

vmware_guest does not connect networks #887

Open gred7 opened 3 years ago

gred7 commented 3 years ago
SUMMARY

ansible 2.9.22

vmware_guest: hostname: "some" username: "someone" password: "somepass" template: "{{ vmtemplate }}" validate_certs: false folder: "" datacenter: qarea name: "{{ tempname }}" state: poweredon guest_id: ubuntu64Guest cluster: "DRS" disk:

ISSUE TYPE
COMPONENT NAME

vmware_guest

ANSIBLE VERSION
ansible 2.9.22
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.12 (default, Mar  1 2021, 11:38:31) [GCC 5.4.0 20160609]
CONFIGURATION
empy
OS / ENVIRONMENT

ubuntu linux Linux jenkins 4.4.0-210-generic #242-Ubuntu SMP Fri Apr 16 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

STEPS TO REPRODUCE
 vmware_guest:
      hostname: "some"
      username: "someone"
      password: "somepass"
      template: "{{ vmtemplate }}"
      validate_certs: false
      folder: ""
      datacenter: qarea
      name: "{{ tempname }}"
      state: poweredon
      guest_id: ubuntu64Guest
      cluster: "DRS"
      disk:
        - size_gb: "{{ disksize }}"
          type: thin
          datastore: DSS
      hardware:
        memory_mb: "{{ memsize }}"
        num_cpus: "{{ ncpus }}"
        scsi: paravirtual
      networks:
        - name: Localnet
          start_connected: yes
          type: dhcp
          when: "{{ localnet | bool == true }}"
        - name: AS
          start_connected: yes
          when: "{{ asnet | bool == true }}"
      wait_for_ip_address: true
    delegate_to: localhost
    register: deploy_vm
EXPECTED RESULTS

when set to true both(or one of) networks are connected on VM poweron

ACTUAL RESULTS

networks are added but left unconnected. in vm i see 2 interfaces with NO CARRIER/DOWN state. if I log into vsphere web interface I see both networks in unconnected state. by hands. I am able to set them to connected, then VM interfaces are up, and got ip addresses.

ansibullbot commented 3 years ago

Files identified in the description:

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot commented 3 years ago

cc @Akasurde @Tomorrow9 @goneri @lparkes @nerzhul @pdellaert @pgbidkar @warthog9 click here for bot help

gred7 commented 3 years ago

also I do not see the case of two networks (or rather more than one) in your tests

alex1989hu commented 3 years ago

I can verify the behaviour what @gred7 reported: networks are added but left unconnected. Additionally, Connect At Power On option is also not checked. In my case, I faced that issue after upgrading Ansible from 2.9.13 to 4.1.0 - and there was only one network card.

phreakocious commented 3 years ago

This happens when vmware tools does not report that guest OS customization completed successfully. Unfortunately, it seems to even happen when customization is not requested. Also fairly confident that I started seeing this behavior after a recent upgrade.

Udayendu commented 3 years ago

I can verify the behaviour what @gred7 reported: networks are added but left unconnected. Additionally, Connect At Power On option is also not checked. In my case, I faced that issue after upgrading Ansible from 2.9.13 to 4.1.0 - and there was only one network card.

I am using ansible 4.1.0 and not facing this atall. Deploying more than 100+ OVA and templates in my vCenter. If the customization wont work then it wont attach the interface.

$ pip3 show ansible
Name: ansible
Version: 4.1.0
Summary: Radically simple IT automation
Home-page: https://ansible.com/
Author: Ansible, Inc.
Author-email: info@ansible.com
License: GPLv3+
Location: /usr/local/lib/python3.8/dist-packages
Requires: ansible-core
Required-by:

Hope this may help you.

GreyArea765 commented 3 years ago

Also seeing the same issue on vSphere 6.7 deploying Linux machines, let me know if any debug output is required.

Ansible version

$ ansible --version
ansible 2.10.11
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/mattb/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  executable location = /usr/bin/ansible
  python version = 3.8.5 (default, May 27 2021, 13:30:53) [GCC 9.3.0]

Workstation OS

$ lsb_release -a
LSB Version:    core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:    20.04
Codename:   focal
GreyArea765 commented 3 years ago

As an aside, I worked around this issue by adding an arbitrary wait timeout and then applying the connected states again, thought it might be useful for anyone finding this issue like I did.

- name: Wait 30 seconds for VM deployment
  wait_for:
    timeout: 30

- name: Fix unconnected network 
  vmware_guest:
    hostname: '{{ vc_ipaddress }}'
    username: '{{ vault_vc_username }}'
    password: '{{ vault_vc_password }}'
    validate_certs: False
    datacenter: '{{ vc_datacenter }}'
    name: '{{ vm_name }}'
    networks:
      - name: '{{ vc_vm_net_name }}'
        connected: yes
        start_connected: yes
Udayendu commented 3 years ago

As an aside, I worked around this issue by adding an arbitrary wait timeout and then applying the connected states again, thought it might be useful for anyone finding this issue like I did.

- name: Wait 30 seconds for VM deployment
  wait_for:
    timeout: 30

- name: Fix unconnected network 
  vmware_guest:
    hostname: '{{ vc_ipaddress }}'
    username: '{{ vault_vc_username }}'
    password: '{{ vault_vc_password }}'
    validate_certs: False
    datacenter: '{{ vc_datacenter }}'
    name: '{{ vm_name }}'
    networks:
      - name: '{{ vc_vm_net_name }}'
        connected: yes
        start_connected: yes

wait_for is usually not needed if you will use "connected: yes" in the main play as I have mentioned earlier.

GreyArea765 commented 3 years ago

@Udayendu not for me, I have connected and start_connected both set to yes in the initial Play and still the interface doesn't get connected. Perhaps that's a difference between your 4.1.0 and my 2.10.11?

        connected: yes
        start_connected: yes
Udayendu commented 3 years ago

@Udayendu not for me, I have connected and start_connected both set to yes in the initial Play and still the interface doesn't get connected. Perhaps that's a difference between your 4.1.0 and my 2.10.11?

        connected: yes
        start_connected: yes

Ok. That should not be the case because my code is working well since 2.9. You are facing this issue only with Ubuntu or other linux OS as well ?

GreyArea765 commented 3 years ago

Honestly I haven't tried running it from another OS, I ran into the issue a couple days ago which led me to the OPs post and thought I'd add the +1 to it. I don't really have the bandwidth right now to test from other platforms but if I can provide any debug from what I have, I'd be happy to help.

sugitk commented 3 years ago

I also faced the similar issue for cloning RHEL 7.9/8.4 VMs. I found error log at /var/log/vmware-imc/toolsDeployPkg.log as below for example.

[2021-06-25T22:38:57.739Z] [   error] execv failed to run (/usr/bin/cloud-init), errno=(2), error message:(そのようなファイルやディレクトリはありません)
[2021-06-25T22:38:57.743Z] [    info] Process exited normally after 0 seconds, returned 127
[2021-06-25T22:38:57.743Z] [    info] No more output from stdout
[2021-06-25T22:38:57.743Z] [    info] No more output from stderr
[2021-06-25T22:38:57.743Z] [    info] Customization command output:
''.
[2021-06-25T22:38:57.743Z] [   error] Customization command failed with exitcode: 127, stderr: ''.
[2021-06-25T22:38:57.743Z] [    info] cloud-init is not installed.
[2021-06-25T22:38:57.743Z] [    info] Executing traditional GOSC workflow.
[2021-06-25T22:38:57.743Z] [   debug] Command to exec : '/usr/bin/perl'.
[2021-06-25T22:38:57.743Z] [    info] sizeof ProcessInternal is 56
[2021-06-25T22:38:57.744Z] [    info] Returning, pending output from stdout
[2021-06-25T22:38:57.744Z] [    info] Returning, pending output from stderr
[2021-06-25T22:38:57.749Z] [   error] execv failed to run (/usr/bin/perl), errno=(2), error message:(そのようなファイルやディレクトリはありません)
[2021-06-25T22:38:57.751Z] [    info] Process exited normally after 0 seconds, returned 127
[2021-06-25T22:38:57.751Z] [    info] No more output from stdout
[2021-06-25T22:38:57.751Z] [    info] No more output from stderr
[2021-06-25T22:38:57.751Z] [    info] Customization command output:

It indicates that the script was not able to find /usr/bin/cloud-init and /usr/bin/perl. Are these packages required for the current vmware_guest module?

I installed perl and cloud-init packages into the template VM and I tried to clone the VM using the vmware_guest module, the VM network connected as expected.

ashkuren commented 3 years ago

I have a similar issue. In my case I wanted to do a vm clone without any modifications. This should not involve any custom specs.

- name: Create a VM from a Template
  community.vmware.vmware_guest:
    hostname: "{{ vcenter.hostname }}"
    username: "{{ vcenter.username }}"
    password: "{{ vcenter.password }}"
    datacenter: "{{ vm.datacenter }}"
    validate_certs: false
    name: "{{ vm.hostname }}"
    template: "{{ vm.template }}"
    folder: "{{ vm.folder }}"
    cluster: "{{ vm.cluster }}"
    datastore: "{{ vm.datastore }}"
    state: poweredon
    hardware:
      num_cpus: "{{ vm.cpu_cores}}"
      memory_mb: "{{ vm.memory_mb }}"
    networks:
      - name: "{{ vm.n1_name }}"
      - name: "{{ vm.n2_name }}"
      - name: "{{ vm.n3_name }}"
      - name: "{{ vm.n4_name }}"
      - name: "{{ vm.n5_name }}"
  delegate_to: localhost

After launching ansible playbook with debug -vvvvv, I have noticed this part:

...
"networks": [
    {
        "name": "pg1",
        "type": "dhcp"
    },
    {
        "name": "pg2",
        "type": "dhcp"
    },
    {
        "name": "pg3",
        "type": "dhcp"
    },
    {
        "name": "pg4",
        "type": "dhcp"
    },
    {
        "name": "pg5",
        "type": "dhcp"
    }
]
...

This play resulted in vm deployed but all networks interfaces were in disconnected state. Doing some research on VMware side, I have come across this event message:

Reconfigured VMNAME
on ESX_NAME
in DC_NAME
. Modified: 
config.tools.pendingCustomization: <unset> -> "/vmfs/volumes/5e29bffb-42291496-5039-0025b513aa0d/VMNAME/imcf-oRxxAd"; 
config.hardware.device(4002).connectable.startConnected: true -> false; 
config.hardware.device(4001).connectable.startConnected: true -> false; 
config.hardware.device(4004).connectable.startConnected: true -> false; 
config.hardware.device(4003).connectable.startConnected: true -> false; 
config.hardware.device(4000).connectable.startConnected: true -> false; 
Added: config.extraConfig("tools.deployPkg.fileName"): (key = "tools.deployPkg.fileName", value = "imcf-oRxxAd");

So I assume that during Customization, VMware disables network interfaces and reenables them after custom spec is finalized. In my case, a template is not customizable and therefore after some time (5-10 minutes), customization fails and NICs are never enabled. VMWare event log line:

An error occurred while customizing VMNAME. For details reference the log file <No Log> in the guest OS.

I dag a bit in the code to check why even customizations are launched when I do not provide any OS level parameters. This brought me to these lines: https://github.com/ansible-collections/community.vmware/blob/ae8bcbbecb68999ab9a580a9c725434959e570ba/plugins/modules/vmware_guest.py#L2745-L2755

I have tested this piece of code on my inputs and surely it resulted in execution of custom spec. I think this behavoir should be changes so that if type is not provided, there should not be any modifications. Another idea is to introduce a third default value of type field that is null or None.

If you intend to perform os customization then you probably need to troubleshoot custom spec execution on os level.

For reference, another similar issue: https://github.com/ansible/ansible/issues/24193

walterrowe commented 3 years ago

We are also experiencing this. Ansible Tower 3.7.5 / Ansible 2.9.18. Create new VM from template and include connected: yes and start_connected: yes in the module network params and it fails to set them. We added a follow-on step to run the vmware_guest_network module to force these to be set after the VM is successfully created.

##
## create a VM from template, powered off state, then add disks and tags
##
## THIS FAILS TO SET start_connected / connected to True.
##
- name: create the guest vm using template
  community.vmware.vmware_guest:
    validate_certs: no
    hostname: "{{ vcenter[location|lower].vc }}"
    datacenter: "{{ vcenter[location|lower].dc }}"
    cluster: "{{ vcenter[location|lower].cl }}"
    name: "{{ vm_guest_name | lower }}"
    state: poweredoff
    template: "{{ os_type }}"
    folder:  "{{ esx_folder }}"
    datastore: "{{ vcenter[location|lower].ds }}"
    hardware:
      hotadd_cpu: yes
      hotadd_memory: yes
      memory_mb: "{{ vm_spec[vm_size].ram }}"
      num_cpus:  "{{ vm_spec[vm_size].cpu }}"
    networks:
      - name: "VLAN_{{ vlan }}"
        type: dhcp
        start_connected: yes
        connected: yes
    wait_for_ip_address: no
  delegate_to: localhost
  register: newvm

##
## ensure the network connects on startup
##
## THIS SUCCEEDS TO SET start_connected / connected to True.
##
- name: set the vm network to connect at startup
  community.vmware.vmware_guest_network:
    validate_certs: no
    hostname: "{{ vcenter[location|lower].vc }}"
    datacenter: "{{ vcenter[location|lower].dc }}"
    cluster: "{{ vcenter[location|lower].cl }}"
    name: "{{ vm_guest_name | lower }}"
    mac_address: "{{ newvm.instance.hw_eth0.macaddress }}"
    network_name: "VLAN_{{ vlan }}"
    start_connected: yes
    connected: yes
docandrew commented 1 year ago

I'm also experiencing this issue in playbooks that previously worked fine. I'm attempting the workaround that @walterrowe is suggesting but it's a bit more work, since my VMs have multiple NICs depending on their role.

Udayendu commented 1 year ago

I'm also experiencing this issue in playbooks that previously worked fine. I'm attempting the workaround that @walterrowe is suggesting but it's a bit more work, since my VMs have multiple NICs depending on their role.

Since last few months even I am doing the same. I build my own template for both windows and linux. As a part of our solution we need to add multiple nics to the vm. Hence in the template we have no nic added by default. And as per the requirement I keep adding the nic and configure them using vmware_guest_network module. So far have not seen any failure with this approach.

But vmware_guest module need some fix for sure as its not able to handle the vm deployment with multiple nics and some time with single nic.

mariolenz commented 1 year ago

I'm also experiencing this issue in playbooks that previously worked fine.

@docandrew What do you mean with "previously"? Could you give a version where it worked, and the version where it stopped working for you? Preferably the version of the community.vmware collection, but even the Ansible (community package) version would help. This might make it easier to troubleshoot.

vmware_guest module need some fix for sure as its not able to handle the vm deployment with multiple nics and some time with single nic.

@Udayendu I've tested with Ansible 7.2.0 (community.vmware 3.3.0) today and the issue still exists. Looks like it works fine if you create a new VM, but not if you deploy from a template. I just don't understand why. vmware_guest is a bit... complex :-/

mariolenz commented 1 year ago

Now this is interesting. I deliberatly crash the module with self.module.fail_json(msg="Template deployed") directly after deploying from / cloning the template here:

https://github.com/ansible-collections/community.vmware/blob/de8e030efcae4c19bd6b3e6670cdbc81b8656afc/plugins/modules/vmware_guest.py#L3055-L3061

I see the following events in the vCenter:

  1. Deploying VM
  2. Assigned new BIOS UUID
  3. Assign a new instance UUID
  4. Reconfigured virtual machine
  5. Template test-template deployed
  6. Associated storage policy [...] with entity: virtualMachine...
  7. Associated storage policy [...] with entity: virtualDiskId...
  8. Reconfigured virtual machine

The first Reconfigured virtual machine event adds the NICs with startConnected = true and connected = false:

Added: config.hardware.device(4001): (dynamicProperty = <unset>, key = 4001, deviceInfo = (label = "Network adapter 2", summary = "DVSwitch: 50 0a 18 77 60 a4 91 07-02 b9 9f 0b b2 40 58 bd"), backing = (port = (switchUuid = "50 0a 18 77 60 a4 91 07-02 b9 9f 0b b2 40 58 bd", portgroupKey = "dvportgroup-103366", portKey = "2196", connectionCookie = 1778155007)), connectable = (migrateConnect = "unset", startConnected = true, allowGuestControl = true, connected = false, status = "untried"), slotInfo = null, controllerKey = 100, unitNumber = 8, numaNode = <unset>, addressType = "assigned", macAddress = "00:50:56:8a:02:4c", wakeOnLanEnabled = true, resourceAllocation = (reservation = 0, share = (shares = 50, level = "normal"), limit = -1), externalId = <unset>, uptCompatibilityEnabled = true, uptv2Enabled = <unset>); config.hardware.device(4000): (dynamicProperty = <unset>, key = 4000, deviceInfo = (label = "Network adapter 1", summary = "DVSwitch: 50 0a 18 77 60 a4 91 07-02 b9 9f 0b b2 40 58 bd"), backing = (port = (switchUuid = "50 0a 18 77 60 a4 91 07-02 b9 9f 0b b2 40 58 bd", portgroupKey = "dvportgroup-30", portKey = "91", connectionCookie = 1778151995)), connectable = (migrateConnect = "unset", startConnected = true, allowGuestControl = true, connected = false, status = "untried"), slotInfo = null, controllerKey = 100, unitNumber = 7, numaNode = <unset>, addressType = "assigned", macAddress = "00:50:56:8a:ba:c5", wakeOnLanEnabled = true, resourceAllocation = (reservation = 0, share = (shares = 50, level = "normal"), limit = -1), externalId = <unset>, uptCompatibilityEnabled = true, uptv2Enabled = <unset>)

Then the second Reconfigured virtual machine event changes startConnected to false:

config.hardware.device(4001).connectable.startConnected: true -> false; config.hardware.device(4000).connectable.startConnected: true -> false;
docandrew commented 1 year ago

@mariolenz thank you for looking into this - in my case I am updating playbooks that had been written for an Ansible version prior to the splitting off of the community VMWare module. Having done some more investigation I don't think it's the same issue that others are referencing in this thread. The playbooks used to run on a single ESXi host with all the necessary portgroups attached to a vSwitch on that host, and now they are a clustered ESXi setup and the port groups were only being added to one of the hosts.

I'm not sure if this is something detectable from the vmware module or not. If it is, it might be helpful to get an outright failure to create the VM when the network isn't present on the ESXi host it is about to be created on. In any case, mea culpa with regards to my comment.

rinosh1989 commented 1 year ago

I am facing similar issue

$ansible --version
ansible [core 2.15.1]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.17 (main, Jul  4 2023, 06:21:22) [GCC 12.2.0] (/usr/local/bin/python)
  jinja version = 3.1.2
  libyaml = True

I added the following in network section

       networks:
        - connected: true
          name: "{{ network_name }}"
          start_connected: true

Still whenever the VM is created, the network interface is always disconnected. If I connect the interface from the vcenter, the VM connects to the port group. Please advice.