BCDevOps / developer-experience

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)
Apache License 2.0
8 stars 17 forks source link

Troubleshoot Inconsistencies running CCM container for playbooks #4523

Open wmhutchison opened 10 months ago

wmhutchison commented 10 months ago

Describe the issue During a recent CCM push, Ian Watts wanted us to run a playbook. After catching up on the new version of the CCM container and where to get it from, that container was used on the local VM images used by Platform Operations to run such playbooks, but ran into ansible issues that Ian Watts could not replicate. Will need to investigate further to find out if this is an issue of how we're running the CCM container, or if it's an issue on Ian Watt's end.

Additional context For the tester here (William) there is no available ansible on the VM itself, so our ability to at least run ansible is coming from the CCM container.

Not a major priority for now since it is fairly rare that we need to run such playbooks in the first place. Best to iron out any inconsistencies though, so that for more parties needing playbooks or other automation/scripting not native to CCM, we can use a common image to run them regardless of individual.

Definition of done

wmhutchison commented 10 months ago
[whutchis@centos8 platform-gitops-gen]$ which ansible
/usr/bin/which: no ansible in (/home/whutchis/.local/bin:/home/whutchis/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
[whutchis@centos8 platform-gitops-gen]$ podman run --rm --ulimit nofile=65535:65535   -v="$(pwd):/gen"   -v="$(pwd)/output:/tmp/platform-gitops-gen"   --env-file="$(pwd)/.devcontainer/devcontainer.env"   --workdir=/gen artifacts.developer.gov.bc.ca/plat-util-images/gitops-container:v1.0.0 ansible-playbook -i inventory/silver playbooks/gitops/remove_old_argocd.yaml

PLAY [Remove old argocd namespaces] ********************************************

TASK [Delete old argocd namespaces] ********************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ModuleNotFoundError: No module named 'kubernetes'
failed: [localhost] (item=argocd) => changed=false
  ansible_facts:
    discovered_interpreter_python: /usr/libexec/platform-python
  ansible_loop_var: item
  error: No module named 'kubernetes'
  item: argocd
  msg: Failed to import the required Python library (kubernetes) on 2edf37619675's Python /usr/libexec/platform-python. Please read the module documentation and install it in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ModuleNotFoundError: No module named 'kubernetes'
failed: [localhost] (item=argocd-shared) => changed=false
  ansible_loop_var: item
  error: No module named 'kubernetes'
  item: argocd-shared
  msg: Failed to import the required Python library (kubernetes) on 2edf37619675's Python /usr/libexec/platform-python. Please read the module documentation and install it in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter

PLAY RECAP *********************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
wmhutchison commented 1 month ago

Need to re-check this against the newer RHEL9 servers where we also do a lot of things via podman and is a consistent environment across all of our clusters versus whatever might have gone into an individual's VM build.

wmhutchison commented 4 weeks ago

Took a fresh look at this. Confirmed that using the gitops-container image still doesn't work on the RHEL9 UTIL server. After much hacking I could get the playbook in question to run with our new standard for running ansible playbooks, but we really can't be doing this for a repo whose a source for CCM and don't want any hacks/changes on our end to affect CCM.

The appropriate solution is to focus on gitops-container and see if I can find its source repo to check if the kubernetes collections are installed in the image or not. If not, then will ask Ian Watts to put a low priority ticket (our need for running adhoc playbooks by Ian are quite low) to build the image with support for that module. Otherwise if the module is baked into the image, will need to figure out why it's not recognizing the module. :)

wmhutchison commented 3 weeks ago

Moving to Sprint Backlog due to taking a lower priority. Took it on since I thought it'd be a quick thing to confirm, but digging a bit into the matter reveals that technically the gitops container does have the required bit for k8s module to run, but isn't working, so more time for troubleshooting will be required. Lower priority since this use case is a rare one, at present only required during a specific major upgrade of Gitops where we need to run playbooks managed by Ian Watts.

wmhutchison commented 3 days ago

Moving into Backlog for now, this wasn't a quick-win and is truly a low priority since at present this only comes up when Ian Watts has a rare case of us needing to run one of his playbooks for a specific major CCM transition.