DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.83k stars 1.19k forks source link

[BUG] CURL and APT-GET SSL failures in install_script_agent7.sh #17051

Open Wind010 opened 1 year ago

Wind010 commented 1 year ago

Agent Environment Latest using curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh

Describe what happened: When running the script within a Debian as part of this base image mcr.microsoft.com/azure-functions/python:4-python3.9-slim:

Logs from Docker build:

 => ERROR [7/7] RUN DD_AGENT_MAJOR_VERSION=7 DD_SITE="datadoghq.com" DD_API_KEY=a22af  3.1s
------
 > [7/7] RUN DD_AGENT_MAJOR_VERSION=7 DD_SITE="datadoghq.com" DD_API_KEY=a652ca9d298f7c13360323ddaeb5b361
DD_INSTALL_ONLY=true bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)":
#10 0.410   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#10 0.410                                  Dload  Upload   Total   Spent    Left  Speed
100 37554  100 37554    0     0  69033      0 --:--:-- --:--:-- --:--:-- 69033
#10 0.960
#10 0.960 * Datadog Agent 7 install script v1.15.0
#10 0.960
#10 0.964
#10 0.964 * Installing apt-transport-https, curl and gnupg
#10 0.964
#10 1.142 Hit:1 http://deb.debian.org/debian bullseye InRelease
#10 1.283 Hit:2 https://packages.microsoft.com/debian/10/prod buster InRelease
#10 1.302 Hit:3 http://security.debian.org/debian-security stable-security InRelease
#10 1.302 Hit:4 http://deb.debian.org/debian-security bullseye-security InRelease
#10 1.326 Hit:5 http://deb.debian.org/debian bullseye-updates InRelease
#10 1.562 Reading package lists...
#10 1.974 Reading package lists...
#10 2.335 Building dependency tree...
#10 2.432 Reading state information...
#10 2.532 apt-transport-https is already the newest version (2.2.4).
#10 2.532 curl is already the newest version (7.74.0-1.3+deb11u7).
#10 2.532 gnupg is already the newest version (2.2.27-2+deb11u2).
#10 2.532 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
#10 2.533
#10 2.533 * Installing APT package sources for Datadog
#10 2.533
#10 2.540   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#10 2.540                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
#10 2.778 curl: (60) SSL certificate problem: self signed certificate in certificate chain
#10 2.778 More details here: https://curl.se/docs/sslcerts.html
#10 2.778
#10 2.778 curl failed to verify the legitimacy of the server and therefore could not
#10 2.778 establish a secure connection to it. To learn more about this situation and
#10 2.778 how to fix it, please visit the web page mentioned above.
#10 2.780
#10 2.780 It looks like you hit an issue when trying to install the Datadog Agent.
#10 2.780
#10 2.780 Troubleshooting and basic usage information for the Datadog Agent are available at:
#10 2.780
#10 2.780     https://docs.datadoghq.com/agent/basic_agent_usage/
#10 2.780
#10 3.059 Unable to send telemetry
#10 3.061
#10 3.061 If you are still having problems, please send an email to support@datadoghq.com
#10 3.061 with the contents of ddagent-install.log and any information you think would be
#10 3.061 useful and we will do our very best to help you solve your problem.
------
executor failed running [/bin/sh -c DD_AGENT_MAJOR_VERSION=7 DD_SITE="datadoghq.com" DD_API_KEY=${DD_API_KEY} DD_INSTALL_ONLY=true bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)"]: exit code: 1

Testing the curl command within container running the base image:

root@de777adae5e7:~/site/wwwroot# curl -v https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public
*   Trying ::ffff:18.161.6.47:443...
* Connected to keys.datadoghq.com (::ffff:18.161.6.47) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS alert, unknown CA (560):
* SSL certificate problem: self signed certificate in certificate chain
* Closing connection 0
curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Curl -V:

root@50e302ad6d0f:~/site/wwwroot# curl -V
curl 7.74.0 (x86_64-pc-linux-gnu) libcurl/7.74.0 OpenSSL/1.1.1n zlib/1.2.11 brotli/1.0.9 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) libssh2/1.9.0 nghttp2/1.43.0 librtmp/2.3
Release-Date: 2020-12-09
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets

Describe what you expected: That the install_script_agent7.sh runs successfully in the base image.

Steps to reproduce the issue: Example Dockerfile:

FROM mcr.microsoft.com/azure-functions/python:4-python3.9-slim
COPY . /home/site/wwwroot
ENV host:logger:consoleLoggingMode=always
ENV AzureFunctionsJobHost__Logging__Console__IsEnabled=true
ENV DEBIAN_FRONTEND=noninteractive
WORKDIR /home/site/wwwroot
ARG TRUSTED_PYPI_HOSTS="--trusted-host pypi.org --trusted-host pypi.python.org --trusted-host=files.pythonhosted.org"

RUN curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
    exit && \
    apt-get update --allow-releaseinfo-change update && \
    ACCEPT_EULA=Y apt-get install -y msodbcsql17 && \
    # optional: for bcp and sqlcmd && \
    ACCEPT_EULA=Y apt-get install -y mssql-tools && \
    echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc && \
    source ~/.bashrc

RUN apt-get update --allow-releaseinfo-change && \
    apt-get install -y --no-install-recommends \
    gcc \
    unixodbc-dev \
    unixodbc \
    ca-certificates \
    libpq-dev && \
    apt-get install --reinstall build-essential -y && \
    apt-get clean

ARG DD_API_KEY
ENV DD_API_KEY=$DD_API_KEY

RUN DD_AGENT_MAJOR_VERSION=7 DD_SITE="datadoghq.com" DD_API_KEY=${DD_API_KEY} DD_INSTALL_ONLY=true bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)" 

docker build --build-arg DD_API_KEY=YOUR_API_KEY

Additional environment details (Operating System, Cloud provider, etc): Windows 11 building the docker image.

Some OS like Debian and Windows may not have the appropriate certificates authorities added to their certificate store.  Commands such as:

curl -v https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public and  apt-get install datadog-agent

Will fail since since the TLS handshake can't be made due to broken certificate chain: curl: (60) ssl certificate problem: self signed certificate in certificate chain

The same failure will occur for apt-get with the https://apt.datadoghq.com package repository.  The workaround would be to include the --insecure flag for curl and -o "Acquire::https::Verify-Peer=false" for apt-get, but this is problematic for the telemetry since it could send the DD_API_KEY and open for any man-in-the-middle interception due.

The safer option is to have the install_script_agent7.sh include population of the expected certificates from https://*.datadoghq.com:

 apt-get install-y openssl

# Download cert from datadoghq
RUN openssl s_client -showcerts -servername datadoghq.com  -connect datadoghq.com:443 > datadoghq.pem

# Populate with X509 details.
openssl x509 -inform PEM -in datadoghq.pem -text -out certdata-datadoghq.com.txt

# Move the file to certificate store directory.
mv datadoghq.pem /usr/local/share/ca-certificates/cacert-datadoghq.com.crt

# Updates /etc/ssl/certs
/usr/sbin/update-ca-certificates
ian28223 commented 1 year ago

Does updating the local cert store prior to installation help? e.g. run below then attempt to run the install script again.

sudo apt-get install -y ca-certificates

Additional comments:

Wind010 commented 1 year ago

That was one of the steps I tried. I had pulled down install_script_agent7.sh locally and updated it to install ca-certificates:

    if [ -z "$sudo_cmd" ]; then
        # if $sudo_cmd is empty, doing `$sudo_cmd X=Y command` fails with
        # `X=Y: command not found`; therefore we don't prefix the command with
        # $sudo_cmd at all in this case
        DEBIAN_FRONTEND=noninteractive apt-get install -y apt-transport-https curl gnupg ca-certificates
    else
        $sudo_cmd DEBIAN_FRONTEND=noninteractive apt-get install -y apt-transport-https curl gnupg ca-certificates
    fi

    ...
   printf "\033[34m\n* UPDATE STORE BEGIN\n\033[0m\n"
    update-ca-certificates
   printf "\033[34m\n* UPDATE STORE END\n\033[0m\n"

No errors on the update-ca-certificates however, same SSL error.

I agree with point 1, however, it's better than having the people workaround for with -k in general. The certificate install, however, was validated in Chrome visually before installation. An additional check would be good for the certificate install outlined above.

I'll double check the API key, but from what I remember, it was just an MD5 hash for my testing.

ian28223 commented 1 year ago

I tried using your Dockerfile but mine built successfully; I ran it around Fri May 12 02:30:00 UTC 2023. Logs here for your reference: https://gist.github.com/ian28223/b8f432325909c704d56a00427db45c3e

I also tried to build it within an Ubuntu VM as well as a Centos VM

Wind010 commented 1 year ago

Very odd since using the same Docker file, I get the logs outlined above. I just tried again and it reproduces locally, but on a newly setup Azure DevOps Build pipeline, the image build succeeds. I would assume that image building would be consistent and the differences is network related...

I am using Docker Desktop version 4.17.0 (99724).

ian28223 commented 1 year ago

Below is what I had in mine installed via curl -sSL get.docker.com | sh

Client: Docker Engine - Community
 Version:           23.0.6
 API version:       1.42
 Go version:        go1.19.9
 Git commit:        ef23cbc
 Built:             Fri May  5 21:18:28 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.6
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.9
  Git commit:       9dbdbd4
  Built:            Fri May  5 21:18:28 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

OSes tried:

There might have been a recent change in the base image (https://github.com/Azure/azure-functions-docker/pull/895/files#diff-fbc1520a90e119f1edbd6abd0554505c982a79092f5aa2923cffe6c60bdf80b1)

Can you try docker builder prune and/or docker image prune -a locally and rebuild?

Wind010 commented 1 year ago

I had actually run docker prune --all and double checked that none of my existing images use that base image. I didn't want to run docker image prune -a since I have other pending work. I did try on a different host and the issue still reproduces on my home and work network with Docker Desktop version 4.17.0 (99724). I'll try updating Docker Desktop.

I was also able to build the Dockerfile successfully through an M1 mac running Rancher version 1.8.1.

cheilamanJHA commented 5 months ago

@Wind010 I know this is an old issue, but did you find a resolution? I'm currently struggling with something similar.

Wind010 commented 5 months ago

@cheilamanJHA I suspect this was an issue with corporate firewall and client proxy on the host, since it will build through Azure DevOps and on my personal machine.