deviantony / docker-elk

The Elastic stack (ELK) powered by Docker and Compose.
MIT License
17.34k stars 6.8k forks source link

Linking the tls to wildcard certificate #846

Closed crimson-med closed 1 year ago

crimson-med commented 1 year ago

Problem description

I'm trying to deploy the stack on a domain: watch.test.com We already have a wildcard certificate for that domain

I have changed the kibana configuration to reflect the following:

server.publicBaseUrl: https://watch.test.com:5601
server.ssl.enabled: true
# Following cert is the wildacrad and its key
server.ssl.certificate: config/kibana.crt 
server.ssl.key: config/kibana.key
xpack.fleet.agents.fleet_server.hosts: [ https://watch.test.com:8220 ]

xpack.fleet.outputs:
  - id: fleet-default-output
    name: default
    type: elasticsearch
    hosts: [ https://watch.test.com:9200 ]
    # Set to output of 'docker-compose up tls'. Example:
    #ca_trusted_fingerprint: bd66954aefe4f89c7f9eaae2222aaa54cde39bd32bb6445d73d397a12119dea8
    is_default: true
    is_default_monitoring: true

I'm not sure here if the CA should still be the one from tls command or if I should leave it commented

When I try to connect the fleet server i get the following:

Elastic Agent will be installed at /opt/Elastic/Agent and will run as a service. Do you want to continue? [Y/n]:Y
{"log.level":"info","@timestamp":"2023-03-29T10:58:46.026Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":407},"message":"Generating self-signed certificate for Fleet Server","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-03-29T10:58:48.663Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":803},"message":"Fleet Server - Error - x509: certificate is valid for localhost, elasticsearch, not watch.test.com","ecs.version":"1.6.0"}
Error: fleet-server failed: context canceled
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.6/fleet-troubleshooting.html
Error: enroll command failed with exit code: 1
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.6/fleet-troubleshooting.html

Do I simply need to replace the tls/ca/ca.crt tls/ca/ca.key tls/fleet-server/fleet-server.crt tls/fleet-server/fleet-server.key with the wildcard?

Agents will need to ping: watch.test.com as they will be spread across various networks.

antoineco commented 1 year ago

If you have a wildcard certificate for the domain, you need to skip the generation of self signed certificates entirely (docker-compose up tls) and, for example, create the following file tree:

tls
├── ca
│   └── my-ca.crt
└── certs
    ├── my-wildcart.crt
    └── my-wildcard.key
  1. If you obtained the certificate from a public authority such as Let's Encrypt, my-ca.crt is the CA certificate of this authority. Similarly, if the certificate was provided by a private authority such as a corporate PKI, my-ca.crt is the CA certificate of this authority.

    The files under certs/ are the certificate and its private key respectively.

    You do NOT need to keep the CA key in the repository. This key is only needed to sign new certificates, which is irrelevant here since certificates aren't signed by docker-elk.

  2. You then need to update the paths to those certificates and key in all your Compose files. E.g. - ./tls/certs/ca/my-ca.crt:/usr/share/elastic-agent/ca.crt:ro.

  3. For Elastic Agent, please beware that you need to provide the fingerprint (via Kibana) of the CA certificate used by Elasticsearch, not the CA certificate itself, unlike with other components. It can be obtained with the command openssl x509 -fingerprint -sha256 -noout -in tls/certs/ca/my-ca.crt. You need to remove all colon characters (:) from the output.

crimson-med commented 1 year ago

Hey Antoine thanks for clearing everything up!

I have tried what you mentioned as well as updated the urls:

logstash/config/logstash.yml

monitoring.elasticsearch.hosts: https://watch.test.com

logstash/pipeline/logstash.conf

output {
    elasticsearch {
        hosts => "watch.test.com:9200"
        user => "logstash_internal"
        password => "${LOGSTASH_INTERNAL_PASSWORD}"
    }
}

kibana/config/kibana.yml

server.publicBaseUrl: https://watch.test.com:5601
elasticsearch.hosts: [https://watch.test.com:9200]
xpack.fleet.agents.fleet_server.hosts: [https://watch.test.com:8220]

However kibana gets stuck on getting ready.

Kibana container log:

[2023-03-30T03:18:26.716+00:00][WARN ][plugins.reporting.config] Found 'server.host: "0.0.0.0"' in Kibana configuration. Reporting is not able to use this as the Kibana server hostname. To enable PNG/PDF Reporting to work, 'xpack.reporting.kibanaServer.hostname: localhost' is automatically set in the configuration. You can prevent this message by adding 'xpack.reporting.kibanaServer.hostname: localhost' in kibana.yml.
[2023-03-30T03:18:26.728+00:00][WARN ][plugins.alerting] APIs are disabled because the Encrypted Saved Objects plugin is missing encryption key. Please set xpack.encryptedSavedObjects.encryptionKey in the kibana.yml or use the bin/kibana-encryption-keys command.
[2023-03-30T03:18:26.962+00:00][INFO ][plugins.ruleRegistry] Installing common resources shared between all indices
[2023-03-30T03:18:27.125+00:00][INFO ][plugins.cloudSecurityPosture] Registered task successfully [Task: cloud_security_posture-stats_task]
[2023-03-30T03:18:30.146+00:00][INFO ][plugins.screenshotting.config] Chromium sandbox provides an additional layer of protection, and is supported for Linux Ubuntu 20.04 OS. Automatically enabling Chromium sandbox.
[2023-03-30T03:18:30.324+00:00][ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. connect ECONNREFUSED 54.254.XXX.XXX:9200
[2023-03-30T03:18:34.284+00:00][INFO ][plugins.screenshotting.chromium] Browser executable: /usr/share/kibana/x-pack/plugins/screenshotting/chromium/headless_shell-linux_x64/headless_shell
[2023-03-30T03:18:53.570+00:00][ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. unable to get issuer certificate

For some reason it tries to tap into the IP instead of the watch.test.com domain.

However if I check the certificate with curl nothing seems wrong:

curl -v -u elastic https://watch.test.com:9200
* Server certificate:
*  subject: CN=*.test.com
*  start date: May XX 00:00:00 2022 GMT
*  expire date: Jun XX 23:59:59 2023 GMT
*  subjectAltName: host "watch.test.com" matched cert's "*.test.com"
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.

Just in case the logs from the other containers:

logstash

[2023-03-30T03:18:46,866][INFO ][logstash.outputs.elasticsearch][main] Failed to perform request {:message=>"Connect to watch.test.com:9200 [watch.test.com/54.254.XXX.XXX] failed: Connection refused", :exception=>Manticore::SocketException, :cause=>#<Java::OrgApacheHttpConn::HttpHostConnectException: Connect to watch.test.com:9200 [watch.test.com/54.254.XXX.XXX] failed: Connection refused>}
[2023-03-30T03:18:46,877][WARN ][logstash.outputs.elasticsearch][main] Attempted to resurrect connection to dead ES instance, but got an error {:url=>"https://logstash_internal:xxxxxx@watch.test.com:9200/", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :message=>"Elasticsearch Unreachable: [https://watch.test.com:9200/][Manticore::SocketException] Connect to watch.test.com:9200 [watch.test.com/54.254.XXX.XXX] failed: Connection refused"}
[2023-03-30T03:18:52,494][WARN ][logstash.outputs.elasticsearch][main] Attempted to resurrect connection to dead ES instance, but got an error {:url=>"https://logstash_internal:xxxxxx@watch.test.com:9200/", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :message=>"Got response code '401' contacting Elasticsearch at URL 'https://watch.test.com:9200/'"}
[2023-03-30T03:18:58,659][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"https://logstash_internal:xxxxxx@watch.test.com:9200/"}
[2023-03-30T03:18:58,710][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch version determined (8.6.2) {:es_version=>8}
[2023-03-30T03:18:58,711][WARN ][logstash.outputs.elasticsearch][main] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>8}

elasticsearch container

{"@timestamp":"2023-03-30T03:18:58.014Z", "log.level": "INFO",  "current.health":"GREEN","message":"Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.kibana-event-log-8.6.2-000001][0]]]).","previous.health":"RED","reason":"shards started [[.kibana-event-log-8.6.2-000001][0]]" , "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.routing.allocation.AllocationService","elasticsearch.cluster.uuid":"XXXX","elasticsearch.node.id":"XXXX","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-03-30T03:18:58.689Z", "log.level": "INFO", "message":"successfully loaded geoip database file [GeoLite2-ASN.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][generic][T#4]","log.logger":"org.elasticsearch.ingest.geoip.DatabaseNodeService","elasticsearch.cluster.uuid":"XXXX","elasticsearch.node.id":"XXXX","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-03-30T03:18:59.927Z", "log.level": "INFO", "message":"successfully loaded geoip database file [GeoLite2-City.mmdb]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][generic][T#3]","log.logger":"org.elasticsearch.ingest.geoip.DatabaseNodeService","elasticsearch.cluster.uuid":"XXXX","elasticsearch.node.id":"XXXX","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
antoineco commented 1 year ago

That's surprising indeed. I see that the error is ECONNREFUSED with the messages unable to get issuer certificate two lines below. Could it be a firewall issue?

The fact that Kibana prints the IP and not the hostname in its logs is not an indicator of a problem from my experience. It does correctly check that the hostname presented by the certificate matches. In this case, the issue is probably elsewhere.

crimson-med commented 1 year ago

I have all the proper ports opened on the firewall

Screenshot 2023-03-30 at 6 02 35 PM

If I manually go on my browser to: https://watch.test.com:9200 I'm prompted with the username password and if they are correct I do reach the landing endpoint:

Screenshot 2023-03-30 at 6 05 05 PM

antoineco commented 1 year ago

Possibly a misconfiguration of the CA certificate then?

https://github.com/deviantony/docker-elk/blob/ff28dec137f889f7159762d6d005a389cc44cdc8/kibana/config/kibana.yml#L22-L25

I can't think about anything else right now.

crimson-med commented 1 year ago

For the CA just to confirm;

We have our certificate, then we have entity sectigo in UK then we have the final in US. Our chain certificate basically has 3 in one. Does the CA need to be both above or just the very root one?

antoineco commented 1 year ago

The CA certificate file should contain the entire chain, unless some of those certificates are already part of the Kibana container image.

If all CAs in the chain are well known public CAs, you don't even need to provide a CA certificate for Kibana to trust Elasticsearch.

crimson-med commented 1 year ago

I've tried ca to have all certs from the chain or just root or intermediary it makes no difference.

The weird thing is I can curl and access through browser https://watch.test.com:9200

I don't understand why logstash isn't able to. I'm able to even curl within the docker container.

I have posted on the elastic forum in case someones comes across the same issue:

https://discuss.elastic.co/t/logstash-cant-talk-to-elasticsearch-but-i-can-access-from-the-browser/328970

antoineco commented 1 year ago

Logstash is getting a permission error, unrelated to TLS.

Got response code '401' contacting Elasticsearch

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401

antoineco commented 1 year ago

Closing because a working solution to the original problem was provided (how to configure docker-elk with custom TLS certificates).

For the resolution of authorization issues, I recommend double checking the effective permissions of the affected users. For example, ensure that the user configured in the Logstash pipeline has authority on custom Elasticsearch indices, etc.

The resolution of network issues is off-topic because docker-elk is a Compose configuration, which is single-host by nature, whereas the original issue involves components interconnected over a public network. Most likely the instance is trying to address itself over 54.254.XXX.XXX (hairpin), and this is improperly configured in the cloud provider. As a result, connections from "outside" work (e.g. from your workstation/browser), but hairpinned connections don't (instance to itself over a gateway).