ansible / ansible-builder

An Ansible execution environment builder
Other
292 stars 97 forks source link

Python packages installed to different Python version called via Ansible #355

Closed stephenhoran closed 1 year ago

stephenhoran commented 2 years ago

I am using a very simple EE file as below attempting to build an image with boto3:

---
version: 1
dependencies:
  galaxy: requirements.yml
  python: requirements.txt

It appears that the requirements are being installed to python 3.8:

bash-4.4# python
Python 3.8.12 (default, Sep 21 2021, 00:10:52) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto3
>>> 

However AWX will call /usr/libexec/platform-python by default, resulting in an error that the package is not installed.

bash-4.4# /usr/libexec/platform-python
Python 3.6.8 (default, Jan 19 2022, 23:28:49) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'boto3'

I see no way to set the python interpreter for an EE via AWX. I am unsure why the version selected by AWX is not the version packages are installed to via the builder or this is something I am doing incorrectly.

$ ansible-builder --version
1.0.1
{
  "msg": "Failed to import the required Python library (botocore or boto3) on automation-job-21-krjsc's Python /usr/libexec/platform-python. Please read the module documentation and install it in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter",
  "invocation": {
    "module_args": {
      "debug_botocore_endpoint_logs": false,
      "validate_certs": true,
      "vpc_ids": [],
      "filters": {},
      "ec2_url": null,
      "aws_access_key": null,
      "aws_secret_key": null,
      "security_token": null,
      "aws_ca_bundle": null,
      "profile": null,
      "aws_config": null,
      "region": null
    }
  },
  "ansible_facts": {
    "discovered_interpreter_python": "/usr/libexec/platform-python"
  },
  "_ansible_no_log": false,
  "changed": false
}
pabelanger commented 2 years ago

/usr/libexec/platform-python isn't support, given that is python36. You should reference /usr/bin/python3 and things will load properly

Shrews commented 2 years ago

I'm not very familiar with AWX. You are able to set the interpreter in the ansible.cfg file, which can be specified in the EE definition file. Not sure if that would fix your issue or not. Pinging @shanemcd for the AWX side of this, but seems like this may be an AWX question vs. an ansible-builder one.

MallocArray commented 2 years ago

I'm not sure why Python 3.6 is even in the ansible-runner containers if it is out of support at this point, but I've been running into the same thing as @stephenhoran

In your playbooks where you are seeing this issue, do you happen to have connection: local at the top of the play? That is my use case when using most of the VMware modules and others that I'm not needing Ansible to directly connect to, and have issues with modules being reported as not found in the Execution Environment I built.

I found these two links which resolve the issue: https://github.com/ansible/ansible/issues/16724 http://willthames.github.io/2018/07/01/connection-local-vs-delegate_to-localhost.html

For our playbook, we landed on connection: local at the top of the play and on each task, we also include delegate_to: local which allows all of my playbooks to work properly in the EE

simonbirtles commented 2 years ago

I have the same issue specifically with pyvmomi, the error being ModuleNotFoundError: No module named 'pyVim' which I add to a custom EE build - spec below, the odd thing is that other modules I specify in requirements.txt for ansible-builder work fine, i.e. dnspython as the playbook shows below for the dig lookup.

This worked in a older version of awx-ee (about a year ago, but which I don't have the reference to the version I used).

Any ideas ?

""Build Command** ansible-builder build --container-runtime docker --tag simonbirtles/awx-ee-03:v0.1.0 --verbosity 3 --prune-images

The awx-ee container can be found here: https://hub.docker.com/repository/docker/simonbirtles/awx-ee-03

Build Environment Using git clone -b 0.6.0 https://github.com/ansible/awx-ee.git

pip3 freeze | grep ansible 
ansible==6.0.0 
ansible-builder==1.1.0 
ansible-core==2.13.1  
ansible-runner==2.2.1 

execution-environment.yml

---
version: 1
dependencies:
  galaxy: _build/requirements.yml
  python: _build/requirements.txt
  system: _build/bindep.txt
additional_build_steps:
  append:
    - RUN alternatives --set python /usr/bin/python3
    - COPY --from=quay.io/project-receptor/receptor:1.0.0a2 /usr/bin/receptor /usr/bin/receptor
    - RUN mkdir -p /var/run/receptor
    - ADD run.sh /run.sh
    - CMD /run.sh
    - RUN git lfs install

requirements.yml

---
collections:
  - name: awx.awx
  - name: azure.azcollection
  - name: amazon.aws
  - name: theforeman.foreman
  - name: google.cloud
  - name: openstack.cloud
  - name: community.vmware
  - name: ovirt.ovirt
  - name: kubernetes.core
  - name: ansible.posix
  - name: ansible.windows
  - name: redhatinsights.insights

requirements.txt

pyvmomi>=7.0.3
git+https://github.com/ansible/ansible-builder.git@devel#egg=ansible-builder
awxkit>=19.0.0
pypsrp
dnspython
netaddr
infoblox-client
urllib3

Partial Playbook

tasks:

  - name: Ansible Version
    debug:
      msg: "Ansible version is {{ ansible_version.full }}"

  - name: Get forward DNS entry for this host
    set_fact:
      dns_lookup: "{{ lookup('community.general.dig', vm_name + '.' + dns_domain + '.' )}}"

  - name: Fail if DNS is not found
    fail:
      msg: "Could not find {{ vm_name }}.{{ dns_domain }} in DNS"
    when: dns_lookup == 'NXDOMAIN'

  - name:  "Clone virtual machine from template: {{ template }}"
    vmware_guest:
      hostname: "{{ vcenter }}"
      username: "{{ vcenter_username }}"
      password: "{{ vcenter_password }}"
      validate_certs: no
      datacenter: "{{ data_center}}"
      datastore: "{{ data_store }}"
      folder: "{{ folder }}"

AWX Job Output

/usr/local/lib/python3.8/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
ansible-playbook [core 2.12.5.post0]
  config file = None
  configured module search path = ['/home/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.8/site-packages/ansible
  ansible collection location = /runner/requirements_collections:/home/runner/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible-playbook
  python version = 3.8.13 (default, Jun 24 2022, 15:27:57) [GCC 8.5.0 20210514 (Red Hat 8.5.0-13)]
  jinja version = 2.11.3
  libyaml = True
No config file found; using defaults
SSH password: 
host_list declined parsing /runner/inventory/hosts as it did not pass its verify_file() method
Parsed /runner/inventory/hosts inventory source with script plugin
redirecting (type: modules) ansible.builtin.vmware_guest to community.vmware.vmware_guest
redirecting (type: modules) ansible.builtin.vmware_guest to community.vmware.vmware_guest
redirecting (type: modules) ansible.builtin.vmware_guest_powerstate to community.vmware.vmware_guest_powerstate
redirecting (type: modules) ansible.builtin.redhat_subscription to community.general.redhat_subscription
redirecting (type: modules) community.general.redhat_subscription to community.general.packaging.os.redhat_subscription
redirecting (type: modules) ansible.builtin.authorized_key to ansible.posix.authorized_key
Skipping callback 'awx_display', as we already have a stdout callback.
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.

PLAYBOOK: main.yaml ************************************************************
2 plays in main.yaml

<localhost> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1657657917.856853-31-86451095759156/ /root/.ansible/tmp/ansible-tmp-1657657917.856853-31-86451095759156/AnsiballZ_vmware_guest.py && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1657657917.856853-31-86451095759156/AnsiballZ_vmware_guest.py && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1657657917.856853-31-86451095759156/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
  File "/tmp/ansible_vmware_guest_payload_b5awe6jo/ansible_vmware_guest_payload.zip/ansible_collections/community/vmware/plugins/module_utils/vmware.py", line 34, in <module>
    from pyVim import connect
ModuleNotFoundError: No module named 'pyVim'
fatal: [localhost]: FAILED! => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/libexec/platform-python"
    },
    "changed": false,
    "invocation": {
        "module_args": {
            "advanced_settings": [],
            "annotation": null,
            "cdrom": [],
            "cluster": "XXXXXX",
            "convert": null,
            "customization": {
                "autologon": null,
                "autologoncount": null,
…
PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=1    skipped=1    rescued=0    ignored=0   
MallocArray commented 2 years ago

This is a known situation with the EE when using tasks that run on the Ansible host. I'm guessing you probably have connection: local at the top of the playbook, but in the EE, that behaves differently than what you may experience when running it directly on a Linux host.

If issues are shown about modules not being available and the playbook is using connection: local at the top of the play, please use the following fix:

delegate_to: localhost

This will ensure that the ansible_python_interpreter variable is set correctly, by creating an implicit localhost entry in inventory. https://docs.ansible.com/ansible-core/devel/inventory/implicit_localhost.html

So in your case, the task using the module vmware_guest: needs to have delegate_to: localhost as one of the parameters and then it will work

simonbirtles commented 2 years ago

I do have connection: local at the play level as you correctly suggest. I did go through and change the vmware_guest (and other local tasks) to delegate_to: localhost as you suggested but the same issue remains. I also used awx-ee v0.5.0 git clone -b 0.5.0 https://github.com/ansible/awx-ee.git and rebuilt the container and same issue occurs.

What is odd (interesting), is that this only effects the pyvmomi module, and not the dnspython module as a task using dig lookup before calling the vmware_guest module task works fine, so I would therefore assume the modules are installed within the correct python environment (version).

The error is specifically about pyVim which should be installed as a dependency of pyvmomi afaik.

    from pyVim import connect
ModuleNotFoundError: No module named 'pyVim'

I understand that this can conflict with a particular vim module, but I don't see this being installed at least I don't specify this in the requirements.txt.

pabelanger commented 2 years ago
        "discovered_interpreter_python": "/usr/libexec/platform-python"

That is your issue, you need to be using /usr/bin/python3.8 for your connection.

As to why python3.6 is in the image, we cannot remove it. As that is the default platform-python on RHEL8 / centos8.

Realistically, we really should have stuck to python3.6 and compiled everything against that. However, that ship has sailed.

The good news is, for RHEL9 shouldn't have this issue. IIRC.

MallocArray commented 2 years ago

In my requirements.txt I do specifically list pyVim, but looking at the requirements for the community.vmware collection doesn't list it specifically. You could still add it to your requirements.txt for the EE just to test

Could you post your entire playbook or at least your modified task to make sure the delegate_to made it to the right place? This is a pretty regular issue with things like the community.vmware collection and typically the delegate_to fixes it, unless it really isn't installed in the EE

simonbirtles commented 2 years ago

Success. I added a ansible.cfg to the playbook root folder and added:

[defaults]
interpreter_python = /usr/bin/python3.8

I had already changed all local tasks to delegate_to: localhost as suggested by @MallocArray, running the job was then successful.

As a test, I then changed all the delegate_to: localhost back to connection: local and ran another job which was also successful. I will move back to delegate_to: localhost anyway. Note: I have always had connection: local set at the play level.

What would be the recommend way to set the interpreter_python to take a global effect (within the container) rather than in each project ? Guess we could modify this in the Dockerfile in additional_build_steps ?

Thank you both for the assistance so far.

sivel commented 1 year ago

A few things to note:

  1. If you have localhost in your inventory, it should be defined to look like this:

    localhost ansible_connection=local ansible_python_interpreter="{{ ansible_playbook_python }}"
  2. It's usually best to not define localhost, and let ansible-core define it for you: https://docs.ansible.com/ansible-core/devel/inventory/implicit_localhost.html
  3. Using connection: local is almost never what you want, as that only changes the connection plugin context, and leaves all other context in place, such as a discovered or configured python interpreter for the target host
  4. With all of the above in order, using delegate_to: localhost will give you the correct configuration.