fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.17k stars 434 forks source link

Bug? http: TLS handshake error from ... local error: tls: bad record MAC #6085

Closed benatsb closed 4 months ago

benatsb commented 2 years ago

Fleet version: 4.15.0

Operating system: Windows 11

Web browser: Edge and Chrome (latest)


🧑‍💻  Expected behavior

Build a windows agent, deploy the windows agent, connect to the Fleet server with no errors.

💥  Actual behavior

Windows 11 device will connect to the Fleet server, but only after I build the agent using the "insecure" flag. The server logs show the following:

level=info ts=2022-06-03T18:56:16.880740187Z component=http path=/api/latest/fleet/device/e7c7dac7-df3e-41dc-91ff-83d6317d2b40 internal="authentication error: invalid device authentication token" err=": Authentication required"
2022/06/03 18:57:11 http: TLS handshake error from local_ip:46872: local error: tls: bad record MAC
2022/06/03 18:59:12 http: TLS handshake error from local_ip:46878: local error: tls: bad record MAC
2022/06/03 18:59:12 http: TLS handshake error from local_ip:46882: local error: tls: bad record MAC
2022/06/03 18:59:12 http: TLS handshake error from local_ip:46884: local error: tls: bad record MAC
2022/06/03 18:59:13 http: TLS handshake error from local_ip:46886: local error: tls: bad record MAC
2022/06/03 18:59:13 http: TLS handshake error from local_ip:46890: local error: tls: bad record MAC
2022/06/03 18:59:13 http: TLS handshake error from local_ip:46892: local error: tls: bad record MAC
2022/06/03 18:59:21 http: TLS handshake error from local_ip:46894: local error: tls: bad record MAC
2022/06/03 18:59:21 http: TLS handshake error from local_ip:46896: local error: tls: bad record MAC
2022/06/03 18:59:22 http: TLS handshake error from local_ip:46898: local error: tls: bad record MAC
2022/06/03 18:59:22 http: TLS handshake error from local_ip:46900: local error: tls: bad record MAC
2022/06/03 18:59:26 http: TLS handshake error from local_ip:46902: local error: tls: bad record MAC
2022/06/03 18:59:26 http: TLS handshake error from local_ip:46904: local error: tls: bad record MAC
2022/06/03 18:59:36 http: TLS handshake error from local_ip:46906: local error: tls: bad record MAC
2022/06/03 18:59:36 http: TLS handshake error from local_ip:46910: local error: tls: bad record MAC
2022/06/03 19:01:50 http: TLS handshake error from local_ip:46930: remote error: tls: bad certificate

More info

Fleet server is on a fresh Ubuntu server 22.04 machine. I used certbot and the "certonly" module there to generate a LetsEncrypt certificate for the server. Copied the certificates over to the fleet installation directory at /etc/fleetdm/

Set permissions for the .key to 600.

Running server for testing with /etc/fleetdm/fleet serve --config /etc/fleetdm/fleet.yml

fleet.yml

mysql:
  address: 127.0.0.1:3306
  database: fleet
  username: fleetadmin
  password: 'password'
redis:
  address: 127.0.0.1:6379
server:
  address: 0.0.0.0:443
  tls_compatibility: modern
  cert: /etc/fleetdm/server.cert
  key: /etc/fleetdm/server.key
  keepalive: true
logging:
  json: true
vulnerabilities:
  current_instance_checks: yes
  databases_path: /etc/fleetdm/vulns
  periodicity: 1h
  #https://nvd.nist.gov/vuln/data-feeds
  #cve_database_url:
logging:
    error_retention_period: 168h
osquery:
    detail_update_interval: 30m
    status_log_plugin: filesystem
#filesystem:
#    status_log_file: /var/log/osquery/status.log
#    result_log_file: /var/log/osquery/result.log
#    enable_log_rotation: true

Built the installer for Windows using the 4.15.0 fleetctl on the same Windows Machine with no osquery or orbit installed. Docker is installed though.

.\fleetctl.exe package --type=msi --fleet-desktop --fleet-url=https://fleettest --enroll-secret=SECRET --insecure

I tried without the "--insecure" flag but that never connected. After a reboot and installing the package with the flag it connects, but error for TLS still occurs server side.


QA notes

To QA this you will need the certificates being added here: https://github.com/fleetdm/fleet/pull/20390

Using fullchain in Fleet server and root CA only client side (should succeed)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/fullchain.cert.pem --server_key ./tools/test-certs/server/server.key.pem and install fleetd.
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem https://localhost:8080

Using fullchain in Fleet server and root+intermediate bundle client side (should succeed)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/fullchain.cert.pem --server_key ./tools/test-certs/server/server.key.pem
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/intermediate-ca/intermediate-and-root.cert.pem and install fleetd.
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/intermediate-ca/intermediate-and-root.cert.pem https://localhost:8080

Using leaf cert in Fleet server and root+intermediate bundle client side (should succeed)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/leaf.cert.pem --server_key ./tools/test-certs/server/server.key.pem
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/intermediate-ca/intermediate-and-root.cert.pem and install fleetd.
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/intermediate-ca/intermediate-and-root.cert.pem https://localhost:8080

Using leaf cert + intermediate bundle in Fleet server and root CA only client side (should succeed)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/leaf-and-intermediate.cert.pem --server_key ./tools/test-certs/server/server.key.pem
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem and install fleetd.
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem https://localhost:8080

Using leaf cert in Fleet server and root CA only client side (should fail)

  1. Run fleet serve with --server_cert ./tools/test-certs/server/leaf.cert.pem --server_key ./tools/test-certs/server/server.key.pem
  2. Generate fleetctl package with --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem and install fleetd.
  3. Test fleetctl debug connection --fleet-certificate ./tools/test-certs/root-ca/root-ca.cert.pem https://localhost:8080
noahtalerman commented 2 years ago

Hey @benatsb sorry you're experiencing this issue.

I'm brining this issue to the Fleet team. This way, the team can provide follow up questions and potential next steps to resolve the issue.

noahtalerman commented 2 years ago

Hey @benatsb the following "Why aren't my osquery agents connecting to Fleet?" section of the docs includes a "Common problems" section: https://fleetdm.com/docs/deploying/faq#why-arent-my-osquery-agents-connecting-to-fleet

bad record MAC: When generating your certificate for your Fleet server, ensure you set the hostname to the FQDN or the IP of the server. This error is common when setting up Fleet servers and accepting defaults when generating certificates using openssl.

I pulled the above from the docs because it looks like you're seeing bad record MAC entries in your logs.

Please let me know if these instructions don't help in successfully resolving your issue.

xpkoala commented 2 years ago

@benatsb I'm going to close this issue for now. If you are still encountering issues please feel free to re-open this ticket with any new information about the problem. Thank you!

xastherion commented 1 year ago

hi, i am confronting the same problems in this thread

SERVER centos stream 9 fleet version 4.38.1

CLIENTS macOS 13 Ventura + 12 Monterey

Certificate von Let´sEncrypt renewed with Dehydrated

Browsers: Firefox 115 ESR + Chrome 117

my client repeated this logs:

W1025 15:16:03.459451 1334582912 tls_enroll.cpp:101] Failed enrollment request to https://my-fleet-server.com:8080/api/v1/osquery/enroll (Request error: certificate verify failed) retrying...

and my Server this:

Oct 25 15:18:38 my-fleet-server fleet[1062]: 2023/10/25 15:18:38 http: TLS handshake error from 129.13.171.194:50805: local error: tls: bad record MAC

Out of all Logs, my fleet client run and is showed in fleet server site, but only the hostname and serialnumber, no more. For this short time the client shine online, after go Offline an no more sucedeed.

grafik

Last fetched almost 54 years ago (that is a lot of time!)

If i turn the client "add host" command with --insecure, all run right. But the logs in server are still present.

N0rthg4t3 commented 7 months ago

I have encountered this issue with Windows clients while setting up a testing environment based on Ubuntu 22.04 LTS and fleetdm version 4.49.2 and following (rather translating) the installation guide for CentOS. One aspect that made my deployment special was the fact that I utilized a TLS certificate issued by a particular internal certification authority belonging to a public key infrastructure dedicated to testing purposes. While I maintained proper full chain certificates and keys on the server side, I experienced these issues in the server log referring to client side TLS validation errors right after client installation and indefinitely ongoing, all whilst the clients had been registered but were displayed as "offline". Thus I took a deeper look at the installed Orbit client and found out that in the client files' root directory there is an accumulation of Base64 coded root CA certificates, called "certs.pem" and comment-titled "Bundle of CA Root Certificates" from Mozilla.

This said, I made the experiment inserting my own CA certificate into this file, restarted the Orbit client and suddenly the error was no longer present in the logs and the client was being displayed as "online" in the web UI. Data could be fetched, so far no functional restrictions in terms of the free version. This said - I think that the Orbit client does not fetch any custom CA that might be installed system-wide in any valid way - thus far, I can only speculate that on Windows devices, the CA certificate being installed in the Windows machine wide cryptstore.

One could speculate that this might also happen while utilizing self signed certificates.

@noahtalerman I have some followup questions:

  1. Is this expected behaviour? Is there any workaround or fix?
  2. Is there some configuration option on the client side that would be more appropriate aside of certs.pem?
  3. If not, can clients, that are built by fleetctl, be configured to automatically involve custom CA's in their configuration? I have thus far not found a parameter for this.
N0rthg4t3 commented 7 months ago

Was able to reproduce it - this time with a TLS certificate that should be publicy trusted through validatable intermediate CA's, however, until the root CA and all intermediate CA's were added to the certificate on the server (effectively full-chaining it) OR the client's cert.pem file, the error persisted.

noahtalerman commented 6 months ago

Thanks @N0rthg4t3!

Heads up @xpkoala, re-opening this issue now that we have a lead on repro.

xpkoala commented 6 months ago

Thanks! It's on my radar.

sharon-fdm commented 6 months ago

Estimating at 5 as it may be hard to reproduce.

DasFaultier commented 6 months ago

@N0rthg4t3 Can you give details on full-chaining the server certificate? Does order matter? I'm currently using a certificate file that contains the actual server certificate, then the intermediates below it and the root cert at the very bottom. However, I'm still seeing the errors. Do I maybe need to call fleet prepare with a specific argument in order for Fleet to accept it?

N0rthg4t3 commented 6 months ago

@DasFaultier When full chaining certificates order does matter, specifically where to put the server certificate and where the intermediate and root CA certificates. Depending on the system, I have been fine by adhering to the order Server Certificate > Intermediate 1 > Intermediate 2 > [..,] > Root CA certificate. And at least that is what I understand of RFC5280, detailing the profile of X.509, section 3.2 (https://datatracker.ietf.org/doc/html/rfc5280#section-3.2).

sharon-fdm commented 6 months ago

@xpkoala I unassigned you so we do not miss this when we have capacity. Still need to reproduce.

lucasmrod commented 5 months ago

Hi folks!

Was able to reproduce it - this time with a TLS certificate that should be publicy trusted through validatable intermediate CA's, however, until the root CA and all intermediate CA's were added to the certificate on the server (effectively full-chaining it) OR the client's cert.pem file, the error persisted.

I performed the following tests with fake certificates and can confirm the above.

Tests

Dummy test certificates:

They were generated using the following guide.

Using fullchain in Fleet server and root CA only client side

Using fullchain in Fleet server and root+intermediate bundle client side

Using leaf cert in Fleet server and root+intermediate bundle client side

Using leaf cert + intermediate bundle in Fleet server and root CA only client side

Using leaf cert in Fleet server and root CA only client side

Next steps

  1. Document that root CA + intermediates must be present in the bundled certificate in fleetd. A default bundle is embedded in fleetctl (when built) and may not contain intermediate certificates present in your server certificate.
  2. Discuss with product team if we can do a TLS connection check to the provided --fleet-url using the certificate (default or provided) during the fleetctl package execution. This will help everyone catch issues during package generation instead of during deploy. We have an existing command fleetctl debug connection to do connection checks to a Fleet URL: , but users may now be aware of it (e.g. fleetctl debug connection --fleet-certificate /opt/orbit/certs.pem https://fleet.example.com). /cc @noahtalerman @rachaelshaw.

For (2) I've created https://github.com/fleetdm/fleet/issues/20142.

lucasmrod commented 5 months ago

I forgot to thank @N0rthg4t3 for your feedback here! (it helped me reproduce the issue)

lucasmrod commented 4 months ago

@xpkoala @PezHub I've added QA notes to the description.

xpkoala commented 4 months ago

The above scenarios were run with the certs provided and I received the expected success / fail states outlined in the steps.

fleet-release commented 4 months ago

In a secure cloud city, TLS handshake finds harmony, Fleet's code, more trustworthy.