ansible / galaxy

Legacy Galaxy still available as read-only on https://old-galaxy.ansible.com - looking for the new galaxy -> https://github.com/ansible/galaxy_ng
Apache License 2.0
855 stars 329 forks source link

Ansible Galaxy collection install does not work from Czech Republic #2527

Open Smurk opened 4 years ago

Smurk commented 4 years ago

Bug Report

SUMMARY

When trying to install collection via ansible-galaxy command (does not matter whether collection name is in requirements file or passed as an argument), the command times out. I've tried it from 3 different locations (my 4G mobile internet, my optic fibre internet at home, internet at my work (1Gbits)). I've also asked 5 more people from Czech Republic (all ~30km near city of Brno) and everyone had the same issue. When one of the friends tried running the command from US VPN, it worked as expected. We've detected this behaviour 2 days ago, when our CI builds started failing - somewhere between 9am - 2pm CET.

STEPS TO REPRODUCE
  1. Run ansible-galaxy collection install community.general command in Czech Republic
EXPECTED RESULTS

Role is installed successfully

ACTUAL RESULTS

Command ends in error ERROR! Unknown error when attempting to call Galaxy at 'https://galaxy.ansible.com/api/': The read operation timed out or similar (the URL changes from run to run).

 ansible-galaxy -vvv collection install community.general
[DEPRECATION WARNING]: Setting verbosity before the arg sub command is deprecated, set the verbosity after the sub command. This feature will be removed in version 2.13. Deprecation warnings can be
disabled by setting deprecation_warnings=False in ansible.cfg.
ansible-galaxy 2.9.10
  config file = /Users/proj/.git/neweb/ansible/ansible.cfg
  configured module search path = ['/Users/proj/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.8/site-packages/ansible
  executable location = /usr/local/bin/ansible-galaxy
  python version = 3.8.5 (default, Sep  3 2020, 21:04:09) [Clang 11.0.3 (clang-1103.0.32.62)]
Using /Users/proj/.git/neweb/ansible/ansible.cfg as config file
Process install dependency map
Opened /Users/proj/.ansible/galaxy_token
Processing requirement collection 'community.general'
Collection 'community.general' obtained from server default https://galaxy.ansible.com/api/
Processing requirement collection 'google.cloud' - as dependency of community.general
Collection 'google.cloud' obtained from server default https://galaxy.ansible.com/api/
Processing requirement collection 'ansible.posix' - as dependency of community.general
ERROR! Unknown error when attempting to call Galaxy at 'https://galaxy.ansible.com/api/v2/collections/ansible/posix/versions/?page=4': The read operation timed out
gundalow commented 4 years ago

We've detected this behaviour 2 days ago, when our CI builds started failing - somewhere between 9am - 2pm CET.

Has ansible-galaxy collection install been failing consistently since then?

Smurk commented 4 years ago

We've detected this behaviour 2 days ago, when our CI builds started failing - somewhere between 9am - 2pm CET.

Has ansible-galaxy collection install been failing consistently since then?

Yes, we are unable to run our CI builds or install collections locally since then.

webknjaz commented 4 years ago

I confirm that I'm hitting timed out errors more often in the past few days (ISP: poda.cz). 3-5 retries usually get the job done but I only have 1-2 collections to download in my tests. I must say that there's no proper error processing logic in the ansibly-galaxy CLI (ansible/ansible repo), no internal retries either but that's a separate issue that may need to be filed against the core repo.

gundalow commented 4 years ago

See also https://github.com/ansible/galaxy/issues/2302

Smurk commented 4 years ago

Just to make it clear - for us, even multiple retries do not work. URL version contained in the error message changes each run (see below). When I try to access the URL from the error message, sometimes it loads quickly, sometimes it takes 15 seconds, sometimes it takes 30 seconds. Same behaviour we see from running curl -L .

➜  ansible git:(master$) ansible-galaxy collection install community.general
Process install dependency map
ERROR! Unknown error when attempting to call Galaxy at 'https://galaxy.ansible.com/api/v2/collections/community/general/versions/': The read operation timed out

➜  ansible git:(master$) ansible-galaxy collection install community.general
Process install dependency map
ERROR! Unknown error when attempting to call Galaxy at 'https://galaxy.ansible.com/api/v2/collections/ansible/netcommon/versions/?page=7': The read operation timed out

➜  ansible git:(master$) ansible-galaxy collection install community.general
Process install dependency map
ERROR! Unknown error when attempting to call Galaxy at 'https://galaxy.ansible.com/api/v2/collections/community/general/versions/1.2.0/': The read operation timed out

➜  ansible git:(master$) ansible-galaxy collection install community.general
Process install dependency map
ERROR! Unknown error when attempting to call Galaxy at 'https://galaxy.ansible.com/api/v2/collections/ansible/posix/versions/': The read operation timed out

➜  ansible git:(master$) ansible-galaxy collection install community.general
Process install dependency map
ERROR! Unknown error when attempting to call Galaxy at 'https://galaxy.ansible.com/api/v2/collections/ansible/netcommon/versions/?page=2': The read operation timed out
webknjaz commented 4 years ago

I've opened https://galaxy.ansible.com/api/v2/collections/community/general/versions/ in my browser and it took a few seconds to get a response. It seems like Galaxy is just slow when processing API calls.

Here's timing as shown for such request in DevTools: galaxy-api-devtools-timing

The concerning part is that the Time To First Byte is 1.27s. So this basically means that there's either (1) some latency in the response delivery or (2) there's some slow DB lookups on the back-end.

webknjaz commented 4 years ago

I've made a few refreshes, and these are TTFB values I've got so far: 866.22ms, 1.09s, 1.02s, 1.07s, 1.19s, 1.12s, 1.09s.

This feels quite slow for just one query. With many requests produced in the process of dependency resolution and installation, I can imagine that some of them would be slow and would cause this many timeouts.

Smurk commented 4 years ago

I've made a few refreshes, and these are TTFB values I've got so far: 866.22ms, 1.09s, 1.02s, 1.07s, 1.19s, 1.12s, 1.09s.

This feels quite slow for just one query. With many requests produced in the process of dependency resolution and installation, I can imagine that some of them would be slow and would cause this many timeouts.

For comparison, here is output of my curl commads run with time - time curl -q -L https://galaxy.ansible.com/api/v2/collections/google/cloud/versions/\?page\=2 >/dev/null 2>/dev/null

curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 0% cpu 18.080 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 0% cpu 32.997 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 0% cpu 3.384 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 2% cpu 0.839 total
curl -q -L  > /dev/null 2> /dev/null  0.01s user 0.01s system 1% cpu 1.191 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 1% cpu 1.214 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 1% cpu 1.335 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 1% cpu 1.706 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 0% cpu 2.364 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 1% cpu 1.942 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 2% cpu 1.081 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 0% cpu 31.734 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 1% cpu 2.037 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 1% cpu 1.345 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 0% cpu 18.154 total
curl -q -L  > /dev/null 2> /dev/null  0.02s user 0.01s system 0% cpu 2.772 total
sodd commented 4 years ago

donno what happend but this issue seems disappeared

sodd commented 4 years ago

Nop, still there :(

ansible-galaxy collection download community.network -p /tmp/shishi

Process install dependency map
ERROR! Unknown error when attempting to call Galaxy at 'https://galaxy.ansible.com/api/v2/collections/fortinet/fortios/versions/': The read operation timed out
webknjaz commented 4 years ago

@cutwater suggested the other day that it may be a problem with Cloudflare... We'll have to wait for somebody with the respective access to check that, I guess.