ansible / terraform-provider-ansible

community terraform provider for ansible
https://registry.terraform.io/providers/ansible/ansible/latest
GNU General Public License v3.0
183 stars 42 forks source link

Provide Ansible tool errors up to the provider #39

Closed lae closed 8 months ago

lae commented 1 year ago

When ansible-playbook et al fail, its output does not seem to be captured and provided to Terraform (or maybe it is but it crashes), which leads to confusion and troubleshooting that should be unnecessary. The debug output includes There may be an error within your playbook. as a way of informing the user, I'm sure, but this is very vague and really not that helpful during development. I'm not sure if this is a bug or it's not implemented, but this information should be provided to the user so they know what to fix.

32 and #20 (#16) are symptoms of this issue and probably wouldn't have been opened if CLI output from the Ansible tooling was appropriately sent up the chain to Terraform, as far as I can tell.

Anyway, the following was one of my reproductions of the plugin crashing whenever ansible-playbook errored out, for whatever reason (inventory issues were masked by this as well).

playbook.tf

terraform {
  required_version = "~> 1.5.0"

  required_providers {
    ansible = {
      version = "~> 1.1.0"
      source  = "ansible/ansible"
    }
  }
}

resource "ansible_playbook" "default" {
  playbook = "playbooks/configure_default.yml"
  name     = "xyz-dev-default-362a135c"
  extra_vars = {
    ansible_host     = "1.2.3.4"
    private_key      = sensitive("123456")
    ansible_ssh_user = "root"
  }
  verbosity = 1
}

playbooks/configure_default.yml:

---
- hosts: all
  tasks:
    - name: Create private key file
      ansible.builtin.copy:
        content: "{{ private_key }}"
        path: /root/private.key

TF_LOG=debug terraform apply

ansible_playbook.default["362a135c"]: Creating...
2023-06-15T10:48:09.681+0900 [INFO]  Starting apply for ansible_playbook.default["362a135c"]
2023-06-15T10:48:09.682+0900 [DEBUG] ansible_playbook.default["362a135c"]: applying the planned Create change
2023-06-15T10:48:09.683+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:09 [DEBUG] setting computed for "args" from ComputedKeys
2023-06-15T10:48:09.683+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:09 [ANSIBLE ARGS]:
2023-06-15T10:48:09.683+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:09 [-v -e hostname=xyz-dev-default-362a135c -e ansible_host=1.2.3.4 -e ansible_ssh_user=root -e private_key=123456 playbooks/configure_default.yml]
2023-06-15T10:48:09.683+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:09 LOG [ansible-playbook]: playbook = playbooks/configure_default.yml
2023-06-15T10:48:09.683+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:09 Inventory /tmp/.inventory-253854049.ini was created
2023-06-15T10:48:09.683+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:09 Temp Inventory File: /tmp/.inventory-253854049.ini
2023-06-15T10:48:09.683+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:09 [TEMP DIR]: /tmp
2023-06-15T10:48:09.684+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:09 [INVENTORIES]:
2023-06-15T10:48:09.684+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:09 [/tmp/.inventory-253854049.ini /tmp/.inventory-986786188.ini]
2023-06-15T10:48:11.328+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/06/15 10:48:11 ERROR [ansible-playbook]: couldn't run ansible-playbook
2023-06-15T10:48:11.328+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: playbooks/configure_default.yml! There may be an error within your playbook.
2023-06-15T10:48:11.328+0900 [DEBUG] provider.terraform-provider-ansible_v1.1.0: exit status 2
2023-06-15T10:48:11.330+0900 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/ansible/ansible/1.1.0/linux_amd64/terraform-provider-ansible_v1.1.0 pid=3240133 error="exit status 1"
2023-06-15T10:48:11.330+0900 [ERROR] plugin.(*GRPCProvider).ApplyResourceChange: error=
"rpc error: code = Unavailable desc = error reading from server: EOF"
2023-06-15T10:48:11.330+0900 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-06-15T10:48:11.335+0900 [DEBUG] State storage *statemgr.Filesystem declined to persist a state snapshot
2023-06-15T10:48:11.335+0900 [ERROR] vertex "ansible_playbook.default[\"362a135c\"]" error: Plugin did not respond
╷
│ Error: Plugin did not respond
│
│   with ansible_playbook.default["362a135c"],
│   on main.tf line 57, in resource "ansible_playbook" "default":
│   57: resource "ansible_playbook" "default" {
│
│ The plugin encountered an error, and failed to respond to the
│ plugin.(*GRPCProvider).ApplyResourceChange call. The plugin logs may contain more
│ details.
╵
2023-06-15T10:48:11.351+0900 [DEBUG] provider: plugin exited
make: *** [Makefile:75: apply] Error 1

ansible-playbook -v -e hostname=xyz-dev-default-362a135c -e ansible_host=1.2.3.4 -e ansible_ssh_user=root -e private_key=123456 playbooks/configure_default.yml -i /tmp/.inventory-253854049.ini (inventory file left over from failed run)

Using /etc/ansible/ansible.cfg as config file

PLAY [all] ****************************************************************************

TASK [Gathering Facts] ****************************************************************
ok: [xyz-dev-default-362a135c]

TASK [Create private key file] **********************************************
fatal: [xyz-dev-default-362a135c]: FAILED! => {"changed": false, "msg": "dest is required"}

PLAY RECAP ****************************************************************************
xyz-dev-default-362a135c : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
anazobec commented 1 year ago

Hi, it is possible to view a more detailed error log from ansible using the ansible_playbook_stderr and ansible_playbook_stdout variables (here). You can view them by either adding an output block with those variables or viewing the state file terraform.tfstate which is created upon terraform apply. It is possible that the state file won't contain any details about this resource since it has failed to be created due to an error, but by using ignore_playbook_errors = true and (if needed) adding ansible_check_mode = true to extra_vars, the resource will be created regardless of the error (from running the playbook). So, your terraform script (the one you provided here) would look something like this.

terraform {
  required_version = "~> 1.5.0"

  required_providers {
    ansible = {
      version = "~> 1.1.0"
      source  = "ansible/ansible"
    }
  }
}

resource "ansible_playbook" "default" {
  playbook = "playbooks/configure_default.yml"
  name     = "xyz-dev-default-362a135c"
  extra_vars = {
    ansible_host     = "1.2.3.4"
    private_key      = sensitive("123456")
    ansible_ssh_user = "root"

    ansible_check_mode = true  # <- here
  }
  verbosity = 1
  ignore_playbook_failure = true  # <- here
}

## You may skip the below blocks and view the terraform.tfstate instead ##
# output stderr
output "playbook_stderr" {
  value = ansible_playbook.default.ansible_playbook_stderr
}

# output stdout
output "playbook_stdout" {
    value = ansible_playbook.default.ansible_playbook_stdout
}

Taking a more detailed look at the code you've provided, I think your error could be resolved by making name and ansible_host the same value. Variable name is a hostname provided for the creation of the temporary inventory file (the kind you've found left over from failed run), so maybe either changing the value of name to the value of ansible_host or the other way around (depending on your needs) may resolve this issue. To give a more detailed explanation... ansible_host (from your example) is set to 1.2.3.4, but the name is set to xyz-dev-default-362a135c. So the temporary inventory file would look something like this:

[default]
xyz-dev-default-362a135c

Upon executing the terraform apply, the playbook (using this temp inventory file) searches for the provided ansible_host which is set to 1.2.3.4, but since it doesn't exist in that inventory, the resource fails to create (the playbook fails).

Hope this helps :)

lae commented 1 year ago

That's useful. I may try out that method later, however this isn't really a request for how to work around this particular failure. From a user experience perspective, seeing the following error:

The plugin encountered an error, and failed to respond to the plugin.(*GRPCProvider).ApplyResourceChange call. The plugin logs may contain more details.

with hardly any other information to immediately go off of is very confusing, and leads one to think that the provider itself is broken. IMO the default behaviour for ansible-playbook failures should not error out like this.

Alternatively, if it's expected that one should capture stderr/stdout within their Terraform configuration like you describe, then that should be illustrated and explained in the examples and documentation.

rwblokzijl commented 1 year ago

This is sadly not a solution when ansible fails. In that case terraform stops before changing the output:

Error ![image](https://github.com/ansible/terraform-provider-ansible/assets/446634/c1d71247-a671-40fb-b742-b277a7f6c872)

When setting TF_LOG=DEBUG we only get the following logging relating to the ansible run:

2023-07-10T12:11:17.773Z [DEBUG] provider.terraform-provider-ansible_v1.1.0: 2023/07/10 12:11:17 ERROR [ansible-playbook]: couldn't run ansible-playbook
2023-07-10T12:11:17.773Z [DEBUG] provider.terraform-provider-ansible_v1.1.0: playbook.yml! There may be an error within your playbook.
2023-07-10T12:11:17.773Z [DEBUG] provider.terraform-provider-ansible_v1.1.0: exit status 4
2023-07-10T12:11:17.774Z [ERROR] plugin.(*GRPCProvider).ApplyResourceChange: error="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-07-10T12:11:17.774Z [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/ansible/ansible/1.1.0/linux_amd64/terraform-provider-ansible_v1.1.0 pid=2148 error="exit status 1"
2023-07-10T12:11:17.774Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-07-10T12:11:17.774Z [DEBUG] State storage *remote.State declined to persist a state snapshot
2023-07-10T12:11:17.774Z [ERROR] vertex "ansible_playbook.playbook" error: Plugin did not respond

I'd suggest printing all ansible output (stdout and stderr):

  1. On fail
  2. Always if some provider flag is set
  3. Always if TF_LOG=INFO is set.
kdpuvvadi commented 1 year ago

facing the same issue here

TF_LOG=INFO ``` [WARN] Provider "registry.terraform.io/ansible/ansible" produced an invalid plan for ansible_playbook.playbook, but we are tolerating it because it is using the legacy plugin SDK. The following problems may be the cause of any confusing errors from downstream operations: - .ansible_playbook_binary: planned value cty.StringVal("ansible-playbook") for a non-computed attribute - .check_mode: planned value cty.False for a non-computed attribute - .diff_mode: planned value cty.False for a non-computed attribute - .force_handlers: planned value cty.False for a non-computed attribute - .ignore_playbook_failure: planned value cty.False for a non-computed attribute ansible_playbook.playbook: Creating... 2023-07-21T12:44:55.711+0530 [INFO] Starting apply for ansible_playbook.playbook 2023-07-21T12:44:56.230+0530 [ERROR] plugin.(*GRPCProvider).ApplyResourceChange: error="rpc error: code = Unavailable desc = error reading from server: EOF" 2023-07-21T12:44:56.237+0530 [ERROR] vertex "ansible_playbook.playbook" error: Plugin did not respond ╷ │ Error: Plugin did not respond │ │ with ansible_playbook.playbook, │ on ansible.tf line 10, in resource "ansible_playbook" "playbook": │ 10: resource "ansible_playbook" "playbook" { │ │ The plugin encountered an error, and failed to respond to the plugin.(*GRPCProvider).ApplyResourceChange call. The plugin logs may contain │ more details. ```

Here's the config

resource "ansible_host" "omada" {
  name   = aws_instance.instance.public_ip
  variables = {
    ansible_user                 = "ubuntu"
    ansible_ssh_private_key_file = "~/.ssh/id_ed25519"
    ansible_python_interpreter  = "/usr/bin/python3"
  }
}

resource "ansible_playbook" "playbook" {
  playbook = "main.yml"
  name     = aws_instance.instance.public_ip
  var_files = [
    "var.yml"
  ]
  replayable = true
  verbosity  = 3
}
smtarslanturk commented 12 months ago

I was actually encountering the following error while creating an Ansible playbook resource:

"The plugin.(*GRPCProvider).ApplyResourceChange request was cancelled."

After incorporating the changes made by @anazobec into my Terraform file, the error got fixed.

Now, I am able to interact with the remote server via the Terraform Ansible provider.

Below are the specific places where I added my example ansible_playbook resource:

resource "ansible_playbook" "playbook-ansible" {
  playbook   = "/Users/sametarslanturk/.../ansible/playbook/1-create_folder.yml"
  name       = "jenkins"
  verbosity  = 3
  extra_vars = {
    ansible_host     = "<IP_ADRESI>"
    private_key      = "/Users/sametarslanturk/.ssh/id_rsa"
    ansible_ssh_user = "<REMOTEINSTANCEHOSTNAME>"
    ansible_check_mode = true  # <- here
  }
  ignore_playbook_failure = true  # <- here
}

output "playbook_stderr" {
  value = ansible_playbook.playbook-jenkins.ansible_playbook_stderr
}

output "playbook_stdout" {
    value = ansible_playbook.playbook-jenkins.ansible_playbook_stdout
}
kgopi1 commented 8 months ago
Hi @smtarslanturk , For ansible_ssh_user = "" , hope it is remote administrator username and not the remote host name ?
gravesm commented 8 months ago

67 should address this issue. Playbook output will be incorporated into the standard Terraform error output for failures.

multinegsix commented 6 months ago

@gravesm Thanks for the fix! Do you have a timeline to release a new version of the provider? The last release was from May last year, which is quite some time ago.

lae commented 6 months ago

Coincidentally, I was thinking about asking about a release timeline on Tuesday, but forgot after trying out a development build instead. It's working very nicely so far, so a new release on the registry would be great.