cloudalchemy / ansible-node-exporter

Provision basic metrics exporter for prometheus monitoring tool
MIT License
501 stars 270 forks source link

Checksum download sometimes fails (status: 400) #171

Closed till closed 4 years ago

till commented 4 years ago

Replacing lookup() with uri:

    - name: Get checksum list from github
      uri:
        url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/sha256sums.txt"
        method: GET
        return_content: true
      register: _checksum_result
      until: _checksum_result.status == 200
      retries: 5

    - name: Set _checksums
      set_fact:
        _checksums: "{{ _checksum_result.stdout_lines }}"
      run_once: true

Yields the 400 — but not sure where Authorization header is introduced?

<localhost> EXEC /bin/sh -c 'echo ~root && sleep 0'
--
2888 | <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1595343177.2519362-233-56170830947126 && echo ansible-tmp-1595343177.2519362-233-56170830947126="` echo /root/.ansible/tmp/ansible-tmp-1595343177.2519362-233-56170830947126 `" ) && sleep 0'
2889 | Using module file /usr/local/lib/python3.7/site-packages/ansible/modules/net_tools/basics/uri.py
2890 | Pipelining is enabled.
2891 | <localhost> EXEC /bin/sh -c '/usr/local/bin/python && sleep 0'
2892 | <localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1595343177.2519362-233-56170830947126/ > /dev/null 2>&1 && sleep 0'
2893 | FAILED - RETRYING: Get checksum list from github (1 retries left).Result was: {
2894 | "attempts": 5,
2895 | "changed": false,
2896 | "connection": "close",
2897 | "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>Basic **redacted**</ArgumentValue><RequestId>C27DCB3881334C01</RequestId><HostId>FIqOQQeYf2vmrxT1CEeevSvccLgS7KYSLaduFK8FjCsbkHoUDMtcQfE5RLzPE8HWKvjJIj9ozzA=</HostId></Error>",
2898 | "content_type": "application/xml",
2899 | "date": "Tue, 21 Jul 2020 14:52:58 GMT",
2900 | "elapsed": 0,
2901 | "invocation": {
2902 | "module_args": {
2903 | "attributes": null,
2904 | "backup": null,
2905 | "body": null,
2906 | "body_format": "raw",
2907 | "client_cert": null,
2908 | "client_key": null,
2909 | "content": null,
2910 | "creates": null,
2911 | "delimiter": null,
2912 | "dest": null,
2913 | "directory_mode": null,
2914 | "follow": false,
2915 | "follow_redirects": "safe",
2916 | "force": false,
2917 | "force_basic_auth": false,
2918 | "group": null,
2919 | "headers": {},
2920 | "http_agent": "ansible-httpget",
2921 | "method": "GET",
2922 | "mode": null,
2923 | "owner": null,
2924 | "regexp": null,
2925 | "remote_src": null,
2926 | "removes": null,
2927 | "return_content": true,
2928 | "selevel": null,
2929 | "serole": null,
2930 | "setype": null,
2931 | "seuser": null,
2932 | "src": null,
2933 | "status_code": [
2934 | 200
2935 | ],
2936 | "timeout": 30,
2937 | "unix_socket": null,
2938 | "unsafe_writes": null,
2939 | "url": "https://github.com/prometheus/node_exporter/releases/download/v1.0.1/sha256sums.txt",
2940 | "url_password": null,
2941 | "url_username": null,
2942 | "use_proxy": true,
2943 | "validate_certs": true
2944 | }
2945 | },
2946 | "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request",
2947 | "redirected": false,
2948 | "retries": 6,
2949 | "server": "AmazonS3",
2950 | "status": 400,
2951 | "transfer_encoding": "chunked",
2952 | "url": "https://github.com/prometheus/node_exporter/releases/download/v1.0.1/sha256sums.txt",
2953 | "x_amz_id_2": "FIqOQQeYf2vmrxT1CEeevSvccLgS7KYSLaduFK8FjCsbkHoUDMtcQfE5RLzPE8HWKvjJIj9ozzA=",
2954 | "x_amz_request_id": "C27DCB3881334C01"
2955 | }

Originally posted by @till in https://github.com/cloudalchemy/ansible-node-exporter/issues/165#issuecomment-661917330

paulfantom commented 4 years ago

url lookup and uri module use very similar code internally. This issue can be workaround by using get_url module instead of uri.

Ideally, we would use url lookup, but currently, I have no fix.

till commented 4 years ago

@paulfantom that's not it. I tried get_url also, same error, but less verbose.

I think uri had the best error, I would vote for changing it, if you are okay with it. The others didn't get me anywhere.

All these mechanisms (lookup('url', ...), uri and get_url) have one thing in common:

The code seems to use a ~/.netrc file/config — it was mentioned in various tickets on ansible/ansible which all related to "400" errors when downloading something from Github. And that happens based on an environment variable:

https://github.com/ansible/ansible/blob/234994fc075222f28943313024c7df5d7010bc37/lib/ansible/module_utils/urls.py#L1220-L1230

Still not sure why the configuration for Github is appended to a request to s3.amazonaws.com. I think the "follow redirect" in one of the Python libs may be buggy. But I wasted already 3 days on this, I am not gonna dig deeper now.

Anyhow, I also found netrc in the internals of the CI server we are using: https://github.com/drone/drone/blob/5b6a3d8ff4c37283cf37df20d871cc8dfe439565/core/netrc.go


I am gonna roll back all changes and confirm this today and then close this ticket. Not sure if you want a note in the readme since it's been a repeated problem. Or add a link to this comment to all the tickets that people previously opened and went stale? =)

paulfantom commented 4 years ago

Thanks for the thorough investigation! Great job!

Looks to me that GitHub stores all assets in S3 bucket hence redirections to s3.amazonaws.com.

It seems to me that this is indeed netrc issue. I wonder if we can forcibly omit it in lookup?

Either way, I think describing this in the documentation would be the best approach here. Let's maybe start a TROUBLESHOOTING.md doc and link to it from README.md? It seems like there are at least 2 issues that are related to user environment and are out of scope for the role - netrc and python forking on OSX. I could sync this file across all repos as those issues are common to all. WDYT?

till commented 4 years ago

Yes, that's a good plan. Where do you want me to PR the file?

till commented 4 years ago

Looks to me that GitHub stores all assets in S3 bucket hence redirections to s3.amazonaws.com.

It seems to me that this is indeed netrc issue. I wonder if we can forcibly omit it in lookup?

I haven't looked but what breaks the redirect is that the code adds the headers from the first request to the second which happens when it follows the location.

Even though the netrc is for "github.com" and not for Amazon's server.

GitHub support said in another ticket to not add the headers. I am "assuming" curl doesn't? To be confirmed (by someone else 🤪).

I haven't checked yet if that's how it's supposed to be or if that's something one can turn off. Seems kinda wild that there's no message that the config is loaded. Maybe I PR that to Ansible.

paulfantom commented 4 years ago

Where do you want me to PR the file?

In root of this repo.

Even though the netrc is for "github.com" and not for Amazon's server.

Yes, but the module follows redirects (which is necessary here) and at the end requests lands in s3.

I am "assuming" curl doesn't?

It doesn't unless specified with --netrc.

how it's supposed to be or if that's something one can turn off.

I believe executing export NETRC= before running ansible should solve this problem.

till commented 4 years ago

So, check this though (without netrc). This is just to figure out if this is a bug in Ansible or some library underneath. I am forcing basic-auth with cURL, it uses it against Github, but doesn't use it against the redirect — which is why it works. Look at the output, it contains Authorization: Basic ... only in the initial request. Or maybe cURL is being smart.

I can't find the RFC, but I think headers etc. are only meant to be used for the first request, not for the redirect. But that's less important here. I'll PR a file about netrc.

❯ curl -v -L --basic https://$GITHUB_TOKEN:x-oauth-basic@github.com/prometheus/node_exporter/releases/download/v1.0.1/sha256sums.txt
*   Trying 140.82.118.4...
* TCP_NODELAY set
* Connected to github.com (140.82.118.4) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=github.com
*  start date: May  5 00:00:00 2020 GMT
*  expire date: May 10 12:00:00 2022 GMT
*  subjectAltName: host "github.com" matched cert's "github.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
*  SSL certificate verify ok.
* Server auth using Basic with user '57bf82e4bef88274a1d8f4db5b2fc08017b1bcbf'
> GET /prometheus/node_exporter/releases/download/v1.0.1/sha256sums.txt HTTP/1.1
> Host: github.com
> Authorization: Basic redacted
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 302 Found
< Date: Wed, 22 Jul 2020 15:14:39 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Server: GitHub.com
< Status: 302 Found
< Vary: X-PJAX
< Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/9524057/333d8080-afed-11ea-87b7-18fcef58bd32?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200722%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200722T151439Z&X-Amz-Expires=300&X-Amz-Signature=3130b8649de1412087cef557dbec349c0574fd6d3968148e0eda1965b34d13d3&X-Amz-SignedHeaders=host&actor_id=0&repo_id=9524057&response-content-disposition=attachment%3B%20filename%3Dsha256sums.txt&response-content-type=application%2Foctet-stream
< Cache-Control: no-cache
< Set-Cookie: logged_in=no; domain=.github.com; path=/; expires=Thu, 22 Jul 2021 15:14:39 GMT; secure; HttpOnly; SameSite=Lax
< Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
< X-Frame-Options: deny
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Expect-CT: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
< Content-Security-Policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com www.githubstatus.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com cdn.optimizely.com logx.optimizely.com/v1/events wss://alive.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; worker-src github.com/socket-worker.js gist.github.com/socket-worker.js
< Vary: Accept-Encoding, Accept, X-Requested-With
< Vary: Accept-Encoding
< X-GitHub-Request-Id: E23B:60C0:8665526:C34A05E:5F1857DF
< 
* Ignoring the response-body
* Connection #0 to host github.com left intact
* Issue another request to this URL: 'https://github-production-release-asset-2e65be.s3.amazonaws.com/9524057/333d8080-afed-11ea-87b7-18fcef58bd32?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200722%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200722T151439Z&X-Amz-Expires=300&X-Amz-Signature=3130b8649de1412087cef557dbec349c0574fd6d3968148e0eda1965b34d13d3&X-Amz-SignedHeaders=host&actor_id=0&repo_id=9524057&response-content-disposition=attachment%3B%20filename%3Dsha256sums.txt&response-content-type=application%2Foctet-stream'
*   Trying 52.216.251.92...
* TCP_NODELAY set
* Connected to github-production-release-asset-2e65be.s3.amazonaws.com (52.216.251.92) port 443 (#1)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=US; ST=Washington; L=Seattle; O=Amazon.com, Inc.; CN=*.s3.amazonaws.com
*  start date: Nov  9 00:00:00 2019 GMT
*  expire date: Mar 12 12:00:00 2021 GMT
*  subjectAltName: host "github-production-release-asset-2e65be.s3.amazonaws.com" matched cert's "*.s3.amazonaws.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert Baltimore CA-2 G2
*  SSL certificate verify ok.
> GET /9524057/333d8080-afed-11ea-87b7-18fcef58bd32?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200722%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200722T151439Z&X-Amz-Expires=300&X-Amz-Signature=3130b8649de1412087cef557dbec349c0574fd6d3968148e0eda1965b34d13d3&X-Amz-SignedHeaders=host&actor_id=0&repo_id=9524057&response-content-disposition=attachment%3B%20filename%3Dsha256sums.txt&response-content-type=application%2Foctet-stream HTTP/1.1
> Host: github-production-release-asset-2e65be.s3.amazonaws.com
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< x-amz-id-2: 7N+Xf+RQxPFActTLIiC1POm+SzHm4MfX7KopS60gfAeSQPuMo/lyU9Xj+8hklN0lMqkTSNSO1WA=
< x-amz-request-id: 0QFPDGBZ5S6T9R1T
< Date: Wed, 22 Jul 2020 15:14:41 GMT
< Last-Modified: Tue, 16 Jun 2020 13:19:50 GMT
< ETag: "9a23fe5723ccdf8bf897864bc51f6f4a"
< Content-Disposition: attachment; filename=sha256sums.txt
< Accept-Ranges: bytes
< Content-Type: application/octet-stream
< Content-Length: 1789
< Server: AmazonS3
< 
eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791  node_exporter-1.0.1.darwin-386.tar.gz
e51d39ef14f5c6accee158e94b5e324fa6eb647444234a4be3491fbc3983df47  node_exporter-1.0.1.darwin-amd64.tar.gz
734e036a849152b185da2080eb8656c36cde862095a464cb17705ca723ea3929  node_exporter-1.0.1.linux-386.tar.gz
3369b76cd2b0ba678b6d618deab320e565c3d93ccb5c2a0d5db51a53857768ae  node_exporter-1.0.1.linux-amd64.tar.gz
017514906922fcc4b7d727655690787faed0562bc7a17aa9f72b0651cb1b47fb  node_exporter-1.0.1.linux-arm64.tar.gz
38413100bfb935c59aea088a0af792134b75972eb90ab2bc6cf1c09ad3b08aea  node_exporter-1.0.1.linux-armv5.tar.gz
c1d7affbc7762c478c169830c43b4c6177a761bf1d2dd715dbffa55ca772655a  node_exporter-1.0.1.linux-armv6.tar.gz
e7f4427a25f1870103588e4968c7dc8c1426c00a0c029d0183a9a7afdd61357b  node_exporter-1.0.1.linux-armv7.tar.gz
43335ccab5728b3c61ea7a0977143719c392ce13a90fa0d14169b5c10e8babd0  node_exporter-1.0.1.linux-mips.tar.gz
c0109f2f76628d2e25ea78e39d4b95100079ee859863be1471519b5e85a2fe78  node_exporter-1.0.1.linux-mips64.tar.gz
bcba02058b9ce171b5c3b077f78f371eb7685239f113200d15787c55fb204857  node_exporter-1.0.1.linux-mips64le.tar.gz
85f0a24c07c5d8237caf36a5c68a63958280dab802b5056ff36d75563d5e5241  node_exporter-1.0.1.linux-mipsle.tar.gz
43aa5e72f5068d16eb8d33f6b729186bf558d40ec0c734746b40a16902864808  node_exporter-1.0.1.linux-ppc64.tar.gz
5ae6c772108c877038cd66a761e4ad93edcc8c446120478499412b24e7953146  node_exporter-1.0.1.linux-ppc64le.tar.gz
2f22d1ce18969017fb32dbd285a264adf3da6252eec05f03f105cf638ec0bb06  node_exporter-1.0.1.linux-s390x.tar.gz
7766d78638c2f84d1084a79d8cb5d8f036b7ce375390870d5e709673118d1260  node_exporter-1.0.1.netbsd-386.tar.gz
41cc54f77f860ed19a7b74f132269f810e3c01fbac5320c3fa2e244fa2247d56  node_exporter-1.0.1.netbsd-amd64.tar.gz
* Connection #1 to host github-production-release-asset-2e65be.s3.amazonaws.com left intact
* Closing connection 0
* Closing connection 1
till commented 4 years ago

I made this, to make it a bit more transparent: https://github.com/ansible/ansible/pull/70806