ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
14.01k stars 3.42k forks source link

Azure Dynamic Inventory plug-in doesn't work in AWX #4180

Closed giovannifl closed 5 years ago

giovannifl commented 5 years ago

ISSUE TYPE

SUMMARY After adding an Azure Dynamic Dynamic Plugin as source of an Inventory in AWX, when I start sync process, it fails.

ENVIRONMENT
STEPS TO REPRODUCE

1) Add credentials (i.e. a Service principal) with proper permission on an Azure subscription 2) Add Azure Resource Manager as source of an existing Inventory 3) Assign Azure creds to the Dynamic Inventory 4) Customize Source variables (i.e. retrieving all RGs) 5) Click on Save button 7) Click button that starts sync process

EXPECTED RESULTS

Synchronize an Inventory with an Azure subscription, retrieving info on all VMs hosted on it

ACTUAL RESULTS

Sync process fails with the following error:

image

ADDITIONAL INFORMATION

It seems sync process is failing because it can't the "azure_rm.yml" file This file should be in the awx-task container.

Sample of Source variable:

image

ryanpetrello commented 5 years ago

@kdelee @AlanCoding this familiar to either of you?

sturgeonson commented 5 years ago

I have the same issue with slightly different warning:

awx

AlanCoding commented 5 years ago

The true error is the "no key" one. The source of this is:

https://github.com/ansible/ansible/blob/v2.8.1/lib/ansible/plugins/inventory/__init__.py#L442

This is due to the inventory file entry which looks like:

https://github.com/ansible/awx/blob/73f16b2bee7af9e6a1f6e32f27661315dd37c7fe/awx/main/tests/data/inventory/plugins/azure_rm/files/azure_rm.yml#L22-L24

In short, we told Ansible to group hosts based on their security_group variable, and it ran into a host that did not have a security_group variable.

Is the host "gfvmdev1xans01" really a host? Or is it actually a group, making this a reappearance of https://github.com/ansible/awx/issues/3448

Does that host have some other special property that causes it to not have a security group? For instance, is it shutting down / terminated?

The fix will be messy, because the azure_rm plugin does not yet provide a strict / non-strict option. If this particular host situation is expected in any situation, then we will probably need to silence the error, which will require a new option from the plugin.


The second traceback is probably more straightforward to fix, because it's easy to think of situations where a public_ip is not expected for an instance, and we probably need to return null. Alternatively, this could be fixed by just using a non-strict option.

giovannifl commented 5 years ago

Hi, A couple of clarifications:

Please could you explain what is a "security_group" ? Is it a shortcut for "Network Security Group" ?

Thanks in advance

AlanCoding commented 5 years ago

Yes, that is a shortcut for network security group. There is not clear-cut documentation on the returned hostvars from plugins (something I wanted to work on but have not gotten around to), the reason I can say this is because I read the source code for the azure plugin and see that it gets its value from "networkSecurityGroup" in properties of something from the API / msrest client.

https://github.com/ansible/ansible/blob/v2.8.1/lib/ansible/plugins/inventory/azure_rm.py#L570-L571

Do you have any ideas for how we could standup a similar instance type to reproduce this error?

giovannifl commented 5 years ago

I have just deployed an Ubuntu server (18.04) on Azure and followed AWX deployment steps for Local Docker. Note: I've also deployed an NGINX in front of it Is there any specific logfile / file that may help the troubleshooting ?

giovannifl commented 5 years ago

As mentioned in issue #4272 , unless an NSG is attached to a VM NIC, import fails:

image

No clues about the second warning:

image

AlanCoding commented 5 years ago

Considering solution of the general form:

diff --git a/awx/main/models/inventory.py b/awx/main/models/inventory.py
index 5c1f4ef8eb..bdae10e47a 100644
--- a/awx/main/models/inventory.py
+++ b/awx/main/models/inventory.py
@@ -1995,7 +1995,7 @@ class azure_rm(PluginFileInjector):
             'location': {'prefix': '', 'separator': '', 'key': 'location'},
             'tag': {'prefix': '', 'separator': '', 'key': 'tags.keys() | list if tags else []'},
             # Introduced with https://github.com/ansible/ansible/pull/53046
-            'security_group': {'prefix': '', 'separator': '', 'key': 'security_group'},
+            'security_group': {'prefix': '', 'separator': '', 'key': 'security_group | default("security_group_null")'},
             'resource_group': {'prefix': '', 'separator': '', 'key': 'resource_group'},
             # Note, os_family was not documented correctly in script, but defaulted to grouping by it
             'os_family': {'prefix': '', 'separator': '', 'key': 'os_disk.operating_system_type'}

This will add a new group with that name, which I would not prefer, and final solution may be to add a non-strict option in Ansible core. Still need to reproduce before we can move on that.

Wicaeed commented 5 years ago

+1 for this issue, currently being affected by this after upgrading our Tower to 3.5.

We use Azure VMs that do not have either:

AlanCoding commented 5 years ago

Should be addressed by #4319, let me know if you continue to see errors with that.

kdelee commented 5 years ago

I've tested out against an Azure account with a host that has no public ip and no security group, which previously did cause the same problems observed by poster + other commenter.

Now this host is coming through just fine:

ansible_host: <redacted>
computer_name: towerqe
id: >-
  <redacted>
image:
  offer: UbuntuServer
  publisher: Canonical
  sku: 18.04-LTS
  version: latest
location: eastus
mac_address: <redacted>
name: <redacted>
network_interface: <redacted>
network_interface_id: >-
 <redacted>
os_disk:
  name: <redacted>
  operating_system_type: linux
os_profile:
  system: linux
plan: null
powerstate: running
private_ip: <redacted>
private_ipv4_addresses:
  - <redacted>
provisioning_state: Succeeded
public_dns_hostnames: []
public_ipv4_addresses: []
resource_group: qe
resource_type: Microsoft.Compute/virtualMachines
security_group: null
security_group_id: null
tags:
  key has spaces: value has spaces
  "key-with-dashes-and-emoji-\U0001F601": "value-with-dashes-and-emoji-\U0001F601"
  peanutbutter: jelly
type: Microsoft.Compute/virtualMachines
virtual_machine_size: Standard_B1ls
vmid: 4f88a4a8-1321-4b5f-8788-faa60cbee68b
vmss: {}

Notice there is no public ip and the security group is null.

Closing for now, please let us know if you have any more problems. @Wicaeed this fix should come out in next patch release AFAIK

sturgeonson commented 5 years ago

Works for me as well, thanks!