Closed resoglas closed 2 years ago
Thanks for the issue @resoglas - we've seen that error a few times as well. In your case, is it just noise in the logs, or is it causing the connection to Elasticsearch from FusionAuth to fail all together?
It seems that connection fails all together. Users list gives an error:
Thanks @resoglas .
When testing with Docker locally and the new "sniffer" config, I had to tell Elasticsearch how to publish the port and host so it didn't use the internal Docker IP and port. Otherwise, it would connect over my specified connection, and then the node would tell the client about it's IP address which was not visible to FusionAuth.
For example, old start command:
docker run -p 9021:9200 -e 'discovery.type=single-node' docker.elastic.co/elasticsearch/elasticsearch:7.6.1"
New command , adding -e 'http.publish_host=localhost'
and -e 'http.publish_port=9021'
.
docker run -p 9021:9200 -e 'discovery.type=single-node' -e 'http.publish_host=localhost' -e 'http.publish_port=9021' docker.elastic.co/elasticsearch/elasticsearch:7.6.1
When running in Docker Compose, I didn't seem to need this when using the bridge
network which makes sense I suppose. I don't know for sure how this translates to K8s.
Thanks @robotdan . I've dug down some deeper and found in ES logs the following messages repeating over and over received plaintext http traffic on an https channel
, those "plaintext messages" are coming from FusionAuth pod, although as you can see in my configuration I use search.servers=https://{{elasticsearch_user}}:{{elasticsearch_password}}@elasticsearch-es-http.elasticsearch:9200
. Could it be that the "sniffer" is ignoring https://
?
Also getting this after upgrading from 1.17.3 to 1.19.3 #857
I think this is due to the publish addresses of Elasticsearch. Here is a good article on the issue: https://www.elastic.co/blog/elasticsearch-sniffing-best-practices-what-when-why-how
See the "But we can fix that" section.
So if that fixes the issue, we can document this much better, or perhaps look into making this new Sniffer configuration optional as it doesn't play real nice with Docker.
So I have changed the http.publish_host
to ${POD_NAME}.elasticsearch-es-default.elasticsearch.svc.cluster.local
and tried connecting to the ES Cluster from another Pod using the following code:
'use strict'
const { Client } = require('@elastic/elasticsearch')
const {URL} = require('url');
const fs = require('fs')
const client = new Client({
node: {
url: new URL('https://user:password@elasticsearch-es-http.elasticsearch.svc.cluster.local:9200'),
},
ssl: {
ca: fs.readFileSync('../app/ca.crt')
},
sniffOnStart: true,
sniffInterval: 1000,
})
client.on('sniff', (err, result) => {
console.log(result.body.nodes)
})
I have got a successful sniff response containing 3 nodes:
{
LrIsZybiRQuDi4ab_a0QeQ: {
name: 'elasticsearch-es-default-1',
transport_address: '10.2.2.34:9300',
host: '10.2.2.34',
ip: '10.2.2.34',
version: '7.8.1',
build_flavor: 'default',
build_type: 'docker',
build_hash: '...',
roles: [
'data',
'ingest',
'master',
'ml',
'remote_cluster_client',
'transform'
],
attributes: {
'ml.machine_memory': '...',
'ml.max_open_jobs': '20',
'xpack.installed': 'true',
'transform.node': 'true'
},
http: {
bound_address: [Array],
publish_address: 'elasticsearch-es-default-1.elasticsearch-es-default.elasticsearch.svc.cluster.local/10.2.2.34:9200',
max_content_length_in_bytes: ...
}
},
rXYyTgCJSQmlHwbZ32257A: {
name: 'elasticsearch-es-default-0',
transport_address: '10.2.0.213:9300',
host: '10.2.0.213',
ip: '10.2.0.213',
version: '7.8.1',
build_flavor: 'default',
build_type: 'docker',
build_hash: '...',
roles: [
'data',
'ingest',
'master',
'ml',
'remote_cluster_client',
'transform'
],
attributes: {
'ml.machine_memory': '...',
'xpack.installed': 'true',
'transform.node': 'true',
'ml.max_open_jobs': '20'
},
http: {
bound_address: [Array],
publish_address: 'elasticsearch-es-default-0.elasticsearch-es-default.elasticsearch.svc.cluster.local/10.2.0.213:9200',
max_content_length_in_bytes: ...
}
},
zVUnh9VuRYy4mwb8RbDzDQ: {
name: 'elasticsearch-es-default-2',
transport_address: '10.2.0.212:9300',
host: '10.2.0.212',
ip: '10.2.0.212',
version: '7.8.1',
build_flavor: 'default',
build_type: 'docker',
build_hash: '...',
roles: [
'data',
'ingest',
'master',
'ml',
'remote_cluster_client',
'transform'
],
attributes: {
'ml.machine_memory': '...',
'ml.max_open_jobs': '20',
'xpack.installed': 'true',
'transform.node': 'true'
},
http: {
bound_address: [Array],
publish_address: 'elasticsearch-es-default-2.elasticsearch-es-default.elasticsearch.svc.cluster.local/10.2.0.212:9200',
max_content_length_in_bytes: ...
}
}
}
And I am able to curl --insecure https://username:password@elasticsearch-es-default-2.elasticsearch-es-default.elasticsearch.svc.cluster.local:9200
successfuly from within the same Pod.
FusionAuth still seems to fail with the same error though... Is there something else I am missing or are there maybe more detailed logs I could find? Thanks!
P. S. Maybe this https://github.com/elastic/cloud-on-k8s/issues/3182 is somewhat related
P. S. Maybe this elastic/cloud-on-k8s#3182 is somewhat related
Yes, thanks for the link - that looks to be the same issue for sure.
FusionAuth v1.18.8 seems to have no problem at all sniffing ES cluster using the following config (which is basically the same as with v1.19+):
database.url=jdbc:postgresql://{{postgresql_host}}:{{postgresql_port}}/{{postgresql_database}}
database.username={{postgresql_user}}
database.password={{postgresql_password}}
fusionauth-app.search-engine-type=elasticsearch
fusionauth-app.search-servers=https://{{elasticsearch_user}}:{{elasticsearch_password}}@elasticsearch-es-http.elasticsearch:9200
fusionauth-app.management-port=9010
fusionauth-app.http-port=9011
fusionauth-app.https-port=9013
fusionauth-app.ajp-port=9019
fusionauth-app.memory=512M
fusionauth-app.additional-java-args=
fusionauth-app.cookie-same-site-policy=Lax
fusionauth.runtime-mode=production
ElasticSearch v7.8.1 nodes immediately respond with a successful index creation/updating message of fusionauth_user
.
The question being - did FusionAuth versions prior to 1.19 were not sniffing for ES cluster nodes?
I have disabled the TLS configuration in ES cluster just to see that this is not a network error and now I am getting:
Sep 18, 2020 1:43:21 PM org.elasticsearch.client.sniff.Sniffer run
SEVERE: error while sniffing nodes
org.elasticsearch.client.ResponseException: method [GET], host [http://elasticsearch-es-default-1.elasticsearch-es-default.elasticsearch.svc.cluster.local:9200], URI [/_nodes/http?timeout=1000ms], status line [HTTP/1.1 401 Unauthorized]
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/_nodes/http?timeout=1000ms]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/_nodes/http?timeout=1000ms]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
at org.elasticsearch.client.sniff.ElasticsearchNodesSniffer.sniff(ElasticsearchNodesSniffer.java:105)
at org.elasticsearch.client.sniff.Sniffer.sniff(Sniffer.java:209)
at org.elasticsearch.client.sniff.Sniffer$Task.run(Sniffer.java:139)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
I now strongly believe that the Sniffer is missing appropriate configurations for Sniffer Scheme (HTTP or HTTPS) and Authentication for Username and Password.
The question being - did FusionAuth versions prior to 1.19 were not sniffing for ES cluster nodes?
This is new in version 1.19.x.
I now strongly believe that the Sniffer is missing appropriate configurations for Sniffer Scheme (HTTP or HTTPS) and Authentication for Username and Password.
Interesting, we can take a look at this.
The sniffer config takes the rest client which we have already configured with credentials, so it seems this should be ok. We'll have to try to recreate.
Maybe related: https://github.com/elastic/kibana/issues/42224
In 1.19.8 (https://github.com/FusionAuth/fusionauth-issues/issues/893) the sniffer is off by default. This should resolve the issue for you.
Please re-open if you encounter an error with the sniffer disabled.
Thank you @robotdan :)
Closing, please re-open if this is still an issue.
error while sniffing nodes
Description
Fresh installation of fusionauth-app:1.19.2 with fresh PostgreSQL 11 and a fresh ES 7.8.1 (7.6.2 was tried also) cluster fails.
Affects versions
Steps to reproduce
kubectl apply -f https://download.elastic.co/downloads/eck/1.2.1/all-in-one.yaml
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
Platform
Additional context