geerlingguy / ansible-role-docker

Ansible Role - Docker
https://galaxy.ansible.com/geerlingguy/docker/
MIT License
1.81k stars 853 forks source link

Add retries to dependency install #371

Closed gk-fschubert closed 1 year ago

gk-fschubert commented 2 years ago

Hi there,

we're using this role for installing docker on machines which are newly created. After the creation on DigitalOcean the machine will use apt sometimes in the first minutes of living.

Therefore the role install fails sometimes.

It would be good to add retries(may 10) to https://github.com/geerlingguy/ansible-role-docker/blob/master/tasks/setup-Debian.yml to avoid a playbook run failure.

geerlingguy commented 2 years ago

I do this quite often (on DigitalOcean as well), and haven't encountered issues like this. What kind of errors are you getting? Is it related to network connectivity not working? Is it an apt cache issue?

nodiscc commented 2 years ago

I think @gk-fschubert refers to the fact that, on first boot, on many Debian-based systems provisioned from a master image, unattended-upgrades will kick in and spend some time upgrading packages which have been updated since the base image was generated. This results in the APT cache being locked and (I think) the task failing.

Though I'm not sure that is still the case - Ansible may have added automatic/implicit retries since I haven't seen this error in a while.

This also used to happen when running the apt module when a normal, daily unattended-upgrade was in progress.

gk-fschubert commented 2 years ago

@geerlingguy we're creating the droplets by terraform and right after ssh is available on the machine, our playbook starts. So maybe not everyone is falling into this trap because there is a longer time between creation and playbook run.

The error i'm getting is:

TASK [docker : Ensure dependencies are installed.] *****************************
ok: [157.245.121.218]
fatal: [157.230.96.191]: FAILED! => {"cache_update_time": 1662360780, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"       install 'apt-transport-https=2.0.9' 'ca-certificates=20211016~20.04.1'' failed: E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 9547 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n", "rc": 100, "stderr": "E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 9547 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n", "stderr_lines": ["E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 9547 (apt-get)", "E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?"], "stdout": "", "stdout_lines": []}
gk-fschubert commented 1 year ago

any update @geerlingguy if you can add some retries?

stale[bot] commented 1 year ago

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!

Please read this blog post to see the reasons why I mark issues as stale.

stale[bot] commented 1 year ago

This issue has been closed due to inactivity. If you feel this is in error, please reopen the issue or file a new issue with the relevant details.