Closed PC-Admin closed 1 year ago
interestingly i also got a new error with destroy.yml today as well:
failed: [localhost] (item=25) => {"ansible_loop_var": "item", "changed": false, "item": 25, "msg": "Reached timeout while waiting for removing VM. Last line in task before timeout: [{'t': \"Could not remove disk 'vm-storage:vm-192-disk-1', check manually: cfs-lock 'storage-vm-storage' error: no quorum!\", 'n': 1}]"}
This is related to recent networking failures it seems.
Moved the playbooks over to FDI in hopes of making it more reliable. (The networks really being hammered at the moment.)
Ended up seeing this new bug, VM 200 definitely existed but I'm guessing we might need a poll here as well:
TASK [Start the requested VMs] ***********************************************************************************************************************
failed: [localhost] (item={'changed': True, 'msg': 'VM prod-phos-k8s-w29 with vmid 200 deployed', 'mac': {'net0': 'CA:D6:D2:34:A4:FB'}, 'devices': {'scsi0': 'vm-storage:vm-200-disk-2'}, 'vmid': 200, 'invocation': {'module_args': {'api_user': 'root@pam', 'api_token_id': 'ansible-maas', 'api_token_secret': 'VALUE_SPECIFIED_IN_NO_LOG_PARAMETER', 'api_host': 'sirius.estuary.tech', 'timeout': 60, 'name': 'prod-phos-k8s-w29', 'memory': 65536, 'balloon': 16384, 'cores': 6, 'agent': 'True', 'description': 'Production Phosphophyllite Kubernetes Worker node', 'onboot': False, 'boot': 'nc', 'bootdisk': 'scsi0', 'cpu': 'host', 'node': 'sirius', 'scsihw': 'virtio-scsi-single', 'scsi': {'scsi0': 'vm-storage:100,format=raw,discard=on,ssd=1'}, 'net': {'net0': 'bridge=vmbr0,virtio,mtu=1,firewall=1'}, 'bios': 'ovmf', 'efidisk0': {'format': 'raw', 'efitype': '4m', 'pre_enrolled_keys': False}, 'validate_certs': False, 'full': True, 'state': 'present', 'update': False, 'proxmox_default_behavior': 'no_defaults', 'api_password': None, 'archive': None, 'acpi': None, 'args': None, 'autostart': None, 'cicustom': None, 'cipassword': None, 'citype': None, 'ciuser': None, 'clone': None, 'cpulimit': None, 'cpuunits': None, 'delete': None, 'digest': None, 'force': None, 'format': None, 'freeze': None, 'hostpci': None, 'hotplug': None, 'hugepages': None, 'ide': None, 'ipconfig': None, 'keyboard': None, 'kvm': None, 'localtime': None, 'lock': None, 'machine': None, 'migrate_downtime': None, 'migrate_speed': None, 'newid': None, 'numa': None, 'numa_enabled': None, 'ostype': None, 'parallel': None, 'pool': None, 'protection': None, 'reboot': None, 'revert': None, 'sata': None, 'serial': None, 'shares': None, 'skiplock': None, 'smbios': None, 'snapname': None, 'sockets': None, 'sshkeys': None, 'startdate': None, 'startup': None, 'storage': None, 'tablet': None, 'tags': None, 'target': None, 'tdf': None, 'template': None, 'vcpus': None, 'vga': None, 'virtio': None, 'vmid': None, 'watchdog': None}}, 'failed': False, 'item': 29, 'ansible_loop_var': 'item'}) => {"ansible_loop_var": "item", "changed": false, "item": {"ansible_loop_var": "item", "changed": true, "devices": {"scsi0": "vm-storage:vm-200-disk-2"}, "failed": false, "invocation": {"module_args": {"acpi": null, "agent": "True", "api_host": "sirius.estuary.tech", "api_password": null, "api_token_id": "ansible-maas", "api_token_secret": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "api_user": "root@pam", "archive": null, "args": null, "autostart": null, "balloon": 16384, "bios": "ovmf", "boot": "nc", "bootdisk": "scsi0", "cicustom": null, "cipassword": null, "citype": null, "ciuser": null, "clone": null, "cores": 6, "cpu": "host", "cpulimit": null, "cpuunits": null, "delete": null, "description": "Production Phosphophyllite Kubernetes Worker node", "digest": null, "efidisk0": {"efitype": "4m", "format": "raw", "pre_enrolled_keys": false}, "force": null, "format": null, "freeze": null, "full": true, "hostpci": null, "hotplug": null, "hugepages": null, "ide": null, "ipconfig": null, "keyboard": null, "kvm": null, "localtime": null, "lock": null, "machine": null, "memory": 65536, "migrate_downtime": null, "migrate_speed": null, "name": "prod-phos-k8s-w29", "net": {"net0": "bridge=vmbr0,virtio,mtu=1,firewall=1"}, "newid": null, "node": "sirius", "numa": null, "numa_enabled": null, "onboot": false, "ostype": null, "parallel": null, "pool": null, "protection": null, "proxmox_default_behavior": "no_defaults", "reboot": null, "revert": null, "sata": null, "scsi": {"scsi0": "vm-storage:100,format=raw,discard=on,ssd=1"}, "scsihw": "virtio-scsi-single", "serial": null, "shares": null, "skiplock": null, "smbios": null, "snapname": null, "sockets": null, "sshkeys": null, "startdate": null, "startup": null, "state": "present", "storage": null, "tablet": null, "tags": null, "target": null, "tdf": null, "template": null, "timeout": 60, "update": false, "validate_certs": false, "vcpus": null, "vga": null, "virtio": null, "vmid": null, "watchdog": null}}, "item": 29, "mac": {"net0": "CA:D6:D2:34:A4:FB"}, "msg": "VM prod-phos-k8s-w29 with vmid 200 deployed", "vmid": 200}, "msg": "VM with name = prod-phos-k8s-w29 does not exist in cluster"}
I believe all of these bugs were a result of Apollo being hammered and the corosync of Proxmox failing. Since Apollo has had it's Proxmox hosts disabled proxmaas now seems more reliable.
This is a strange one: