ansible-collections / kubernetes.core

The collection includes a variety of Ansible content to help automate the management of applications in Kubernetes and OpenShift clusters, as well as the provisioning and maintenance of clusters themselves.
Other
216 stars 135 forks source link

k8s_info returned is successful == true when the api-server was not reachable. #508

Closed wmlynch closed 1 year ago

wmlynch commented 2 years ago
SUMMARY

The k8s_info module will return successful == true after the resource cache has been established during periods where communication to the api-server is not possible. The Recreate steps listed below simulate situations where a playbook contains a series of k8s module tasks. After a handful of successful k8s tasks are run, the communication to the api-server becomes problematic due to temporary/intermittent availability/communication issues. During this api-server "problematic" phase, the k8s_info based tasks continue to return successful == true with an empty resources list.

If kubectl get ... would fail due to an api-server with intermittent availability/communication problems, so should k8s_info.

ISSUE TYPE
COMPONENT NAME

##### COLLECTION VERSION
<!--- Paste verbatim output from "ansible-galaxy collection list <namespace>.<collection>"  between the quotes
for example: ansible-galaxy collection list community.general
-->
```paste below
ansible-galaxy collection list

# /Users/wmlynch/armada-dev/src/[redacted]/.venv/lib/python3.9/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    3.2.0  
ansible.netcommon             3.0.1  
ansible.posix                 1.4.0  
ansible.utils                 2.6.1  
ansible.windows               1.10.0 
arista.eos                    5.0.1  
awx.awx                       21.0.0 
azure.azcollection            1.12.0 
check_point.mgmt              2.3.0  
chocolatey.chocolatey         1.2.0  
cisco.aci                     2.2.0  
cisco.asa                     3.0.0  
cisco.dnac                    6.4.0  
cisco.intersight              1.0.19 
cisco.ios                     3.0.0  
cisco.iosxr                   3.0.0  
cisco.ise                     2.4.1  
cisco.meraki                  2.6.2  
cisco.mso                     2.0.0  
cisco.nso                     1.0.3  
cisco.nxos                    3.0.0  
cisco.ucs                     1.8.0  
cloud.common                  2.1.1  
cloudscale_ch.cloud           2.2.2  
community.aws                 3.2.1  
community.azure               1.1.0  
community.ciscosmb            1.0.5  
community.crypto              2.3.2  
community.digitalocean        1.19.0 
community.dns                 2.1.1  
community.docker              2.6.0  
community.fortios             1.0.0  
community.general             5.0.2  
community.google              1.0.0  
community.grafana             1.4.0  
community.hashi_vault         3.0.0  
community.hrobot              1.3.1  
community.libvirt             1.1.0  
community.mongodb             1.4.0  
community.mysql               3.2.1  
community.network             4.0.1  
community.okd                 2.2.0  
community.postgresql          2.1.5  
community.proxysql            1.4.0  
community.rabbitmq            1.2.1  
community.routeros            2.1.0  
community.sap                 1.0.0  
community.sap_libs            1.1.0  
community.skydive             1.0.0  
community.sops                1.2.2  
community.vmware              2.5.0  
community.windows             1.10.0 
community.zabbix              1.7.0  
containers.podman             1.9.3  
cyberark.conjur               1.1.0  
cyberark.pas                  1.0.14 
dellemc.enterprise_sonic      1.1.1  
dellemc.openmanage            5.4.0  
dellemc.os10                  1.1.1  
dellemc.os6                   1.0.7  
dellemc.os9                   1.0.4  
f5networks.f5_modules         1.17.0 
fortinet.fortimanager         2.1.5  
fortinet.fortios              2.1.6  
frr.frr                       2.0.0  
gluster.gluster               1.0.2  
google.cloud                  1.0.2  
hetzner.hcloud                1.6.0  
hpe.nimble                    1.1.4  
ibm.qradar                    2.0.0  
infinidat.infinibox           1.3.3  
infoblox.nios_modules         1.2.2  
inspur.sm                     2.0.0  
junipernetworks.junos         3.0.1  
kubernetes.core               2.3.1  
mellanox.onyx                 1.0.0  
netapp.aws                    21.7.0 
netapp.azure                  21.10.0
netapp.cloudmanager           21.17.0
netapp.elementsw              21.7.0 
netapp.ontap                  21.19.1
netapp.storagegrid            21.10.0
netapp.um_info                21.8.0 
netapp_eseries.santricity     1.3.0  
netbox.netbox                 3.7.1  
ngine_io.cloudstack           2.2.4  
ngine_io.exoscale             1.0.0  
ngine_io.vultr                1.1.1  
openstack.cloud               1.8.0  
openvswitch.openvswitch       2.1.0  
ovirt.ovirt                   2.0.4  
purestorage.flasharray        1.13.0 
purestorage.flashblade        1.9.0  
sensu.sensu_go                1.13.1 
servicenow.servicenow         1.0.6  
splunk.es                     2.0.0  
t_systems_mms.icinga_director 1.29.0 
theforeman.foreman            3.4.0  
vmware.vmware_rest            2.1.5  
vyos.vyos                     3.0.1  
wti.remote                    1.0.3  

# /Users/wmlynch/.ansible/collections/ansible_collections
Collection       Version
---------------- -------
community.docker 2.2.0  
CONFIGURATION
ansible-config dump --only-changed
ANSIBLE_PIPELINING(/Users/wmlynch/armada-dev/src/[redacted]/ansible.cfg) = True
CALLBACKS_ENABLED(/Users/wmlynch/armada-dev/src/[redacted]/ansible.cfg) = ['timer', 'profile_roles']
DEFAULT_STDOUT_CALLBACK(/Users/wmlynch/armada-dev/src/[redacted]/ansible.cfg) = yaml
OS / ENVIRONMENT
sw_vers
ProductName:    macOS
ProductVersion: 12.4
BuildVersion:   21F79
STEPS TO REPRODUCE

I used kind to recreate but any kubernetes cluster will work.

1. kind create cluster --name test --kubeconfig /tmp/kind.kubeconfig --image kindest/node:v1.24.4
2. kubectl --kubeconfig /tmp/kind.kubeconfig create secret generic my-secret --from-literal=foo=bar
3. cp /tmp/kind.kubeconfig /tmp/botched.kubeconfig
4. Edit /tmp/botched.kubeconfig and remove the "certificate-authority-data:" line from the file.
5. ansible-playbook recreate-k8s-info-error.yml -vvv

# PLAYBOOK recreate-k8s-info-error.yml 

---
- hosts: localhost
  connection: local
  tasks:
  - name: Check for existing cluster secret with good kubeconfig
    k8s_info:
      api_version: v1
      kind: Secret
      name: 'my-secret'
      namespace: 'default'
      kubeconfig: '/tmp/kind.kubeconfig'
    register: _secret_data_a

# The expectation is that this will result in a failed task.
# However, this will return as a successful task.
  - name: Check for existing cluster secret with bad kubeconfig
    k8s_info:
      api_version: v1
      kind: Secret
      name: 'my-secret'
      namespace: 'default'
      kubeconfig: '/tmp/botched.kubeconfig'
    register: _secret_data_b
EXPECTED RESULTS

I expect that the "Check for existing cluster secret with bad kubeconfig" task to fail.

ACTUAL RESULTS
ansible-playbook recreate-k8s-info-error.yml -v
No config file found; using defaults
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [localhost] ***************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************************************************
ok: [localhost]

TASK [Check for existing cluster secret with good kubeconfig] ******************************************************************************************************************************************************
ok: [localhost] => {"api_found": true, "changed": false, "resources": [{"apiVersion": "v1", "data": {"foo": "YmFy"}, "kind": "Secret", "metadata": {"creationTimestamp": "2022-09-07T03:51:50Z", "managedFields": [{"apiVersion": "v1", "fieldsType": "FieldsV1", "fieldsV1": {"f:data": {".": {}, "f:foo": {}}, "f:type": {}}, "manager": "kubectl-create", "operation": "Update", "time": "2022-09-07T03:51:50Z"}], "name": "my-secret", "namespace": "default", "resourceVersion": "1009", "uid": "8b96bebd-8d4d-46b7-9645-5abd85fc25d3"}, "type": "Opaque"}]}

TASK [Check for existing cluster secret with bad kubeconfig] *******************************************************************************************************************************************************
ok: [localhost] => {"api_found": true, "changed": false, "msg": "Exception 'HTTPSConnectionPool(host='127.0.0.1', port=55002): Max retries exceeded with url: /api/v1/namespaces/default/secrets/my-secret?fieldSelector=&labelSelector= (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))' raised while trying to get resource using {'name': 'my-secret', 'namespace': 'default', 'label_selector': '', 'field_selector': ''}", "resources": []}

PLAY RECAP *********************************************************************************************************************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
gravesm commented 2 years ago

@wmlynch Thanks for reporting this. It should definitely fail if the cluster is unreachable at any point.