ansible-collections / community.general

Ansible Community General Collection
https://galaxy.ansible.com/ui/repo/published/community/general/
GNU General Public License v3.0
821 stars 1.51k forks source link

Support proxmoxer local mode #5449

Open BioSehnsucht opened 1 year ago

BioSehnsucht commented 1 year ago

Summary

Currently, Proxmox_KVM uses Proxmoxer in API mode. Proxmoxer also supports a SSH and (since Proxmoxer 1.3.0) local mode (the main difference being that local is intended for use when you're already SSH'd into the host, and SSH performs this SSH step itself). Supporting SSH mode is probably not worthwhile since it can't do anything API mode can't, but it would be handy to support 'local' mode of operation in Proxmoxer, since it is potentially fewer places to mess with credentials. Assuming that Ansible can already SSH into the Proxmox host, then using local mode, we don't need to also include / use API credentials in the playbook (either directly or indirectly), since the playbook would already be running on the host.

Notable changes for local mode would be that api_host, api_user, and api_password would be no longer required if using local mode. There would need to be a way to specify using backend='local' to Proxmox_KVM so that it can be passed to ProxmoxAPI() (which is called from plugins/module_utils/proxmox.py rather than directly in plugins/modules/cloud/misc/proxmox_kvm.py itself).

The only downside I'm aware of is that I believe Ansible would be essentially accessing Proxmox as root rather than some other limited user, so there may be cases where it is more desirable to do it the current way, such as providing a clear log within Proxmox that the action came from Ansible rather than the root user. However I would like to at least have the option to trade this reduction in logging clarity for simpler playbooks, since for some of them I need to run actions directly on the Proxmox host anyways (i.e. to run qm importdisk ... to import a cloudinit image to a VM disk, since this can't be done via the API), and it saves having to either store credentials in playbooks or pass them in somehow (via variables or vaults, etc).

I propose adding a backend option to Proxmox_KVM, which defaults to https (for normal API usage), but when set to local causes the api_* options to optional and/or ignored rather than required (api_host and api_user are normally required, the others optional) .

Issue Type

Feature Idea

Component Name

proxmox_kvm

Additional Information

The first example in the docs, except using local mode (assumes that this playbook is being run on one of the nodes in the cluster - doesn't necessarily need to be the node the VM is created on):

---
- name: Create VM using local mode
  hosts:
    - "{{ VM_HOST }}"

  tasks:
    - name: Create new VM with minimal options
      community.general.proxmox_kvm:
        backend: local
        name: spynal
        node: sabrewulf

vs the normal API mode as documented (could also be run on "{{ VM_HOST }}" or some other host, but probably don't want to run on all):

---
- name: Create VM using API
  hosts: 
    - localhost

  tasks:
    - name: Create new VM with minimal options
      community.general.proxmox_kvm:
        api_user: root@pam
        api_password: secret
        api_host: helldorado
        name: spynal
        node: sabrewulf

Code of Conduct

ansibullbot commented 1 year ago

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot commented 1 year ago

cc @Ajpantuso @Thulium-Drake @helldorado @joshainglis @karmab @tleguern click here for bot help

felixfontein commented 1 year ago

I think @reitermarkus started working on this in #4027, but right now the PR looks abandoned.

reitermarkus commented 1 year ago

PR looks abandoned.

Rebased https://github.com/ansible-collections/community.general/pull/4027. Still waiting for someone to test though.

felixfontein commented 1 year ago

@BioSehnsucht since you're interested in this feature, would you mind testing that PR?

BioSehnsucht commented 1 year ago

@BioSehnsucht since you're interested in this feature, would you mind testing that PR?

What's the easiest/quickest way to test this? Do I build a new execution environment from that PR somehow instead? I've built a custom EE before to include some things, but that was just editing the requirements yaml/txt files.

From looking at PR, it might need more changes since 2 days ago there was a comment about a directory restructure ...

ansibullbot commented 1 year ago

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

felixfontein commented 1 year ago

You should be able to install @reitermarkus's branch by passing git+https://github.com/reitermarkus/community.general.git,proxmox-openssh to galaxy (see https://docs.ansible.com/ansible/latest/galaxy/user_guide.html#installing-a-collection-from-a-git-repository-at-the-command-line).

From looking at PR, it might need more changes since 2 days ago there was a comment about a directory restructure ...

For just trying the current version of the branch out this should be fine. The branch should more or less behave than the 5.8.0 release of community.general.

BioSehnsucht commented 1 year ago

@felixfontein @reitermarkus

Removing api_* to use local mode works, but...

If I re-run the otherwise exact same playbook but with using API mode, it works without specifying the above things.

I have manually tested proxmox using API and local modes, and found that it handles None values differently (or perhaps, Proxmox does depending on how it's accessed). With API mode, they get sane defaults, and local mode they don't. So really, this is a problem on Proxmoxer and/or Proxmox APIs with different behavior depending on how the calls are made.

In addition to tracking down the root cause further upstream in Proxmoxer and/or Proxmox to fix things, I can also think of the following "solutions":

A) Document that you must specify those values clearly, and do nothing else to solve it, leaving the issue to the user to work around (would be the PR as-is plus documenting this quirk)

B) Have proxmox_kvm check any kwargs that are None after the various calls to module.params[...], and attempt to set them to their current values when updating (by reading the values via Proxmoxer), and sane defaults with creating (i.e. sockets defaults to 1, and whatever the other defaults might be for anything that wasn't specified and it might complain about, not sure what the others might be that you wouldn't be setting with a create task anyways) - this might interact with the compatibility setting since setting that would already set sockets to 1 for example, otherwise it has no default currently (and normally that's not a problem).

C) Kick the issue upstream to Proxmoxer and/or Proxmox to be resolved (might be done also in conjunction with a or b)

For the moment, I don't have any tasks that I wouldn't be able to work around the issue by specifying the values explicitly, so I can use this PR as-is, it just requires some added values in the tasks, so I would be fine with either A, B, A+C or B+C scenarios above, but would value having this functionality sooner rather than later, with or without extra hoops to make it function on my end, so I would prefer to avoid scenario C by itself.

kristianheljas commented 1 year ago

I also started testing this feature and ran into same problem as @BioSehnsucht, that the None values are not handled well using local module. It seems None values are implicitly converted to strings, not removed.

I'm gonna take the approach C and investigate if i'm able to change the local backend behavior on None values and do a PR there.

If that's not possible, then I'd suggest approach D instead: always call proxmox api methods with **kwargs, dropping all None values beforehand.

I'm gonna report back when i find the difference between https and local handling of these values.

kristianheljas commented 1 year ago

Haa, it is actually requests module removing the values, not proxmoxer.

I do not have high hopes for it, but I opened an issue in proxmoxer (https://github.com/proxmoxer/proxmoxer/issues/119) asking if they're willing to do the same. They seem pretty fast to respond, so I guess it's worth to wait for their response.

If they reject, I'd be happy to help with approach D, removing the None values ourselves.

Other than that, I don't think approach A and B are viable though - both would result in duplicated default values for the API.

jhollowe commented 1 year ago

Adding functionality to make the API and local match would be a good addition to proxmoxer. If there are differences between the backends that has no reason to be there, removing the differences is great.

In regards to the conversation of permissions and the local backend using root: This isn't strictly true. Any user that has the needed permission can run the local backend (use the pvesh executable), the ansible SSH would just need to connect as that user or become that user once on the PVE node. Reusing the credentials from the SSH-ing into the node does prevent the need for additional credentials. However, this reduces the logging/audit to that user. Using pve realm users is a way to use credentials that do not correspond to an actual linux user on the PVE node which limits those credentials to only api access. API Tokens are an even better authentication method and are simpler to use (and are stateless and so have no timeout limits). They also are a lot more lightweight and can be separately provisioned and revoked from the user they are associated with.

If you have any other questions on authentication or proxmoxer specifically, ask away!

BioSehnsucht commented 1 year ago

@kristianheljas @jhollowe I see that the relevant PR for Proxmoxer was merged, so I guess we only need to either wait for a release or install specifically the version from Github develop branch? No changes needed on Ansible side?

kristianheljas commented 1 year ago

@BioSehnsucht Sorry, I forgot to ping here after the merge. I'm not sure about release schedule of proxmoxer, but yes you can install the devel version using pip install proxmoxer@git+https://github.com/proxmoxer/proxmoxer.git@develop to continue.

I see no significant changes to ansible side which would affect the testing and will test the solution in the coming day myself as well.

BioSehnsucht commented 1 year ago

@kristianheljas @jhollowe @reitermarkus

I tested this today, and conveniently proxmoxer has released 2.0.0, so no need to install the develop branch after all. :smile:

From my limited testing (bringing up a VM and setting cloud-init in a second proxmoxer action) it appears to work as expected, so this is probably good to merge the PR #4027 on the Ansible side.

BioSehnsucht commented 1 year ago

Might have spoke too soon, I wasn't starting the VM from Ansible yet, and doing so fails when using local. Looks like another issue on Proxmoxer side though I'm guessing, since it appears the command line output from starting the VM is tripping it up, it thinks that the informative message about creating the cloud-init ISO image is an error, when it's not.

TASK [Start VM] ********************************************************************************************************************************************************************************************************************************************************************
task path: /root/ansible-test/00-testing.yaml:84
<vm1.REDACTED> ESTABLISH LOCAL CONNECTION FOR USER: root
<vm1.REDACTED> EXEC /bin/sh -c 'echo ~root && sleep 0'
<vm1.REDACTED> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1669611410.316544-2048841-230567367285361 `" && echo ansible-tmp-1669611410.316544-2048841-230567367285361="` echo /root/.ansible/tmp/ansible-tmp-1669611410.316544-2048841-230567367285361 `" ) && sleep 0'
Using module file /root/.ansible/collections/ansible_collections/community/general/plugins/modules/proxmox_kvm.py
<vm1.REDACTED> PUT /root/.ansible/tmp/ansible-local-2026020jz11stwf/tmprzsmeite TO /root/.ansible/tmp/ansible-tmp-1669611410.316544-2048841-230567367285361/AnsiballZ_proxmox_kvm.py
<vm1.REDACTED> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1669611410.316544-2048841-230567367285361/ /root/.ansible/tmp/ansible-tmp-1669611410.316544-2048841-230567367285361/AnsiballZ_proxmox_kvm.py && sleep 0'
<vm1.REDACTED> EXEC /bin/sh -c '/usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1669611410.316544-2048841-230567367285361/AnsiballZ_proxmox_kvm.py && sleep 0'
<vm1.REDACTED> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1669611410.316544-2048841-230567367285361/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
  File "/tmp/ansible_community.general.proxmox_kvm_payload_g5s4qk51/ansible_community.general.proxmox_kvm_payload.zip/ansible_collections/community/general/plugins/modules/proxmox_kvm.py", line 1351, in main
  File "/tmp/ansible_community.general.proxmox_kvm_payload_g5s4qk51/ansible_community.general.proxmox_kvm_payload.zip/ansible_collections/community/general/plugins/modules/proxmox_kvm.py", line 1037, in start_vm
  File "/tmp/ansible_community.general.proxmox_kvm_payload_g5s4qk51/ansible_community.general.proxmox_kvm_payload.zip/ansible_collections/community/general/plugins/modules/proxmox_kvm.py", line 870, in wait_for_task
  File "/tmp/ansible_community.general.proxmox_kvm_payload_g5s4qk51/ansible_community.general.proxmox_kvm_payload.zip/ansible_collections/community/general/plugins/module_utils/proxmox.py", line 157, in api_task_ok
    status = self.proxmox_api.nodes(node).tasks(taskid).status.get()
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 169, in get
    return self(args)._request("GET", params=params)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 158, in _request
    raise ResourceException(
fatal: [vm1.REDACTED]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "acpi": null,
            "agent": null,
            "api_host": null,
            "api_password": null,
            "api_token_id": null,
            "api_token_secret": null,
            "api_user": "root@pam",
            "args": null,
            "autostart": null,
            "balloon": null,
            "bios": null,
            "boot": null,
            "bootdisk": null,
            "cicustom": null,
            "cipassword": null,
            "citype": null,
            "ciuser": null,
            "clone": null,
            "cores": null,
            "cpu": null,
            "cpulimit": null,
            "cpuunits": null,
            "delete": null,
            "description": null,
            "digest": null,
            "efidisk0": null,
            "force": null,
            "format": null,
            "freeze": null,
            "full": true,
            "hostpci": null,
            "hotplug": null,
            "hugepages": null,
            "ide": null,
            "ipconfig": null,
            "keyboard": null,
            "kvm": null,
            "localtime": null,
            "lock": null,
            "machine": null,
            "memory": null,
            "migrate_downtime": null,
            "migrate_speed": null,
            "name": null,
            "nameservers": null,
            "net": null,
            "newid": null,
            "node": null,
            "numa": null,
            "numa_enabled": null,
            "onboot": null,
            "ostype": null,
            "parallel": null,
            "pool": null,
            "protection": null,
            "proxmox_default_behavior": "no_defaults",
            "reboot": null,
            "revert": null,
            "sata": null,
            "scsi": null,
            "scsihw": null,
            "searchdomains": null,
            "serial": null,
            "shares": null,
            "skiplock": null,
            "smbios": null,
            "snapname": null,
            "sockets": null,
            "sshkeys": null,
            "startdate": null,
            "startup": null,
            "state": "started",
            "storage": null,
            "tablet": null,
            "tags": null,
            "target": null,
            "tdf": null,
            "template": null,
            "timeout": 30,
            "update": false,
            "validate_certs": false,
            "vcpus": null,
            "vga": null,
            "virtio": null,
            "vmid": 104,
            "watchdog": null
        }
    },
    "msg": "starting of VM 104 failed with exception: 500 Internal Server Error: No 'get' handler defined for '/nodes/vm1/tasks/{'errors': 'generating cloud-init ISO\\n\"UPID:vm1:001F438F:279CA8FC:63843F95:qmstart:104:root@pam:\"\\n'}/status'",
    "status": "stopped",
    "vmid": 104
}
BioSehnsucht commented 1 year ago

Looks like it is this, which was previously reported as an issue with the SSH backend and closed as being a proxmox issue ... https://github.com/proxmoxer/proxmoxer/issues/117

I guess Proxmox is actually signifying this as an error when outputting JSON, when it shouldn't be.

BioSehnsucht commented 1 year ago

Since the VM actually starts, I guess I'll have to do some failed_when magic (rather not accept ALL errors, just this one error or actual success) to handle it so it doesn't fail the rest of the playbook.

BioSehnsucht commented 1 year ago

As I dig into this, I realize that @reitermarkus is already aware of the issue as I stumbled upon a forum thread discussing the problem. https://forum.proxmox.com/threads/api-bug-pvesh-create-nodes-node-qemu-output-format-json-response-format-broken.111469/

TL;DR this broke in Proxmox after a certain point, and they have yet to fix it.

BioSehnsucht commented 1 year ago

For reference, this seems to work fine, to cause the play to not fail due to the error. Of course, Ansible still thinks that the VM failed to start, even though it started.

    - name: Start VM
      community.general.proxmox_kvm:
        vmid: "{{ qemu.vmid }}"
        state: started
      register: start_status
      failed_when:
        - start_status.msg is not search('generating cloud-init ISO')
ansibullbot commented 1 year ago

cc @UnderGreen click here for bot help

ansibullbot commented 8 months ago

cc @krauthosting click here for bot help

krauthosting commented 8 months ago

@BioSehnsucht @reitermarkus @l00ptr We're currently starting the process of moving the proxmox modules into their own collection community.proxmox and would like to resolve this issue before doing so. Looking at the Bugzilla ticket, it seems still blocked by an upstream issue? If so, it might be best if we close this PR for now and then reopen it against community.proxmox once that blocker has been cleared?

Since this PR effectively implements support for a second backend, it would also be a good to discuss how we want to handle different backends in the future. Maybe instead of the implicit behavior proposed here, it'd be better to have a backend arg that can be toggled between api, ssh and local, with corresponding args becoming required when switched?

krauthosting commented 7 months ago

@felixfontein Please close this and we can continue in community.proxmox after upstream fix.

felixfontein commented 7 months ago

@krauthosting why not simply move this issue over to the other repo? Also you can close it yourself by writing close_me (see bot help: https://github.com/ansible/ansibullbot/blob/devel/ISSUE_HELP.md#commands).