1.19.2 error while sniffing nodes

resoglas commented 4 years ago

error while sniffing nodes

Description

Fresh installation of fusionauth-app:1.19.2 with fresh PostgreSQL 11 and a fresh ES 7.8.1 (7.6.2 was tried also) cluster fails.

Affects versions

FusionAuth 1.19.0-1.19.2 with Elasticsearsh 7.6.2, 7.8.1 clusters and PostgreSQL 11

Steps to reproduce

kubectl apply -f https://download.elastic.co/downloads/eck/1.2.1/all-in-one.yaml
Depoloy an example https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-deploy-elasticsearch.html with 3 nodes
Deploy FusionAuth app
Watch logs of FA app - successfully connects to PostgreSQL, to ES cluster, finishes kickstarting, and after little time throws the error;

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

fusionauth-deploy.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: fusionauth
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: fusionauth-cert
  namespace: fusionauth
spec:
  secretName: fusionauth-self-signed-tls
  dnsNames:
    - fusionauth.fusionauth.svc.cluster.local
    - fusionauth.fusionauth
    - fusionauth
  issuerRef:
    name: ca-issuer
    kind: ClusterIssuer
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fusionauth-cm
  namespace: fusionauth
data:
  fusionauth.properties: |-
    database.url=jdbc:postgresql://{{postgresql_host}}:{{postgresql_port}}/{{postgresql_database}}
    database.username={{postgresql_user}}
    database.password={{postgresql_password}}

    search.type=elasticsearch
    search.servers=https://{{elasticsearch_user}}:{{elasticsearch_password}}@elasticsearch-es-http.elasticsearch:9200

    fusionauth-app.management-port=9010
    fusionauth-app.http-port=9011
    fusionauth-app.https-port=9013
    fusionauth-app.ajp-port=9019
    fusionauth-app.memory=512M
    fusionauth-app.cookie-same-site-policy=Lax
    fusionauth-app.runtime-mode=production
---
kind: Service
apiVersion: v1
metadata:
  namespace: fusionauth
  name: fusionauth-client
  labels:
    app: fusionauth
    type: ClusterIP
spec:
  type: ClusterIP
  ports:
    - port: 443
      targetPort: 9013
      protocol: TCP
      name: https
    - port: 80
      targetPort: 9011
      protocol: TCP
      name: http
  selector:
    app: fusionauth
---
apiVersion: v1
kind: Secret
metadata:
  name: fusionauth-kickstart
  namespace: fusionauth
data:
  kickstart.init: {{kickstart_init}}
  kickstart.application: {{kickstart_application}}
  kickstart.admin: {{kickstart_admin}}
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fusionauth
  namespace: fusionauth
spec:
  selector:
    matchLabels:
      app: "fusionauth"
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  template:
    metadata:
      name: fusionauth
      namespace: fusionauth
      labels:
        app: fusionauth
    spec:
      volumes:
        - name: ca-crt
          secret:
            secretName: fusionauth-self-signed-tls
            items:
              - key: ca.crt
                path: ca.crt
        - name: openjdk-security
          emptyDir: {}
        - name: fusionauth-config
          emptyDir: {}
        - name: config-volume
          configMap:
            name: fusionauth-cm
            optional: false
            items:
              - key: fusionauth.properties
                path: fusionauth.properties
        - name: kickstart-volume
          secret:
            secretName: fusionauth-kickstart
            items:
              - key: kickstart.init
                path: kickstart.json
              - key: kickstart.application
                path: requests/application-ihp.json
              - key: kickstart.admin
                path: requests/user-admin.json
      initContainers:
        - name: fusionauth-config
          image: {{image_fusionauth}}
          securityContext:
            runAsUser: 0
          volumeMounts:
            - name: ca-crt
              mountPath: /tmp/certs
            - name: openjdk-security
              mountPath: /tmp/security
            - name: fusionauth-config
              mountPath: /tmp/fa-config-merged
            - name: config-volume
              mountPath: /tmp/fa-config
          command:
            - sh
            - -c
            - keytool -importcert -noprompt -keystore /opt/openjdk/lib/security/cacerts -storepass changeit -file /tmp/certs/ca.crt;
              cp -R /opt/openjdk/lib/security/. /tmp/security/;
              cp -R /usr/local/fusionauth/config/. /tmp/fa-config-merged/;
              rm /tmp/fa-config-merged/fusionauth.properties;
              cp /tmp/fa-config/fusionauth.properties /tmp/fa-config-merged/fusionauth.properties
      containers:
        - name: fusionauth
          image: {{image_fusionauth}}
          volumeMounts:
            - name: openjdk-security
              mountPath: /opt/openjdk/lib/security
            - name: kickstart-volume
              mountPath: /usr/local/fusionauth/kickstart
            - name: fusionauth-config
              mountPath: /usr/local/fusionauth/config
          ports:
            - containerPort: 9011
              name: http
            - containerPort: 9013
              name: https
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: FUSIONAUTH_APP_URL
              value: http://$(POD_IP):9011
            - name: FUSIONAUTH_KICKSTART
              value: /usr/local/fusionauth/kickstart/kickstart.json
          resources:
            requests:
              memory: "512Mi"
            limits:
              memory: "512Mi"
          livenessProbe:
            httpGet:
              path: /
              port: http
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /
              port: http
          startupProbe:
            httpGet:
              path: /
              port: http
            failureThreshold: 20
            periodSeconds: 10

Platform

Kubernetes

Additional context

org.elasticsearch.client.sniff.Sniffer run
SEVERE: error while sniffing nodes
org.apache.http.ConnectionClosedException: Connection is closed
    at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
    at org.elasticsearch.client.sniff.ElasticsearchNodesSniffer.sniff(ElasticsearchNodesSniffer.java:105)
    at org.elasticsearch.client.sniff.Sniffer.sniff(Sniffer.java:209)
    at org.elasticsearch.client.sniff.Sniffer$Task.run(Sniffer.java:139)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
    at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: org.apache.http.ConnectionClosedException: Connection is closed
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.endOfInput(HttpAsyncRequestExecutor.java:356)
    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:261)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
    ... 1 more

robotdan commented 4 years ago

Thanks for the issue @resoglas - we've seen that error a few times as well. In your case, is it just noise in the logs, or is it causing the connection to Elasticsearch from FusionAuth to fail all together?

resoglas commented 4 years ago

It seems that connection fails all together. Users list gives an error:

Screenshot from 2020-09-08 21-52-13

robotdan commented 4 years ago

Thanks @resoglas .

When testing with Docker locally and the new "sniffer" config, I had to tell Elasticsearch how to publish the port and host so it didn't use the internal Docker IP and port. Otherwise, it would connect over my specified connection, and then the node would tell the client about it's IP address which was not visible to FusionAuth.

For example, old start command:

docker run -p 9021:9200 -e 'discovery.type=single-node' docker.elastic.co/elasticsearch/elasticsearch:7.6.1"

New command , adding -e 'http.publish_host=localhost' and -e 'http.publish_port=9021'.

docker run -p 9021:9200 -e 'discovery.type=single-node' -e 'http.publish_host=localhost' -e 'http.publish_port=9021' docker.elastic.co/elasticsearch/elasticsearch:7.6.1

When running in Docker Compose, I didn't seem to need this when using the bridge network which makes sense I suppose. I don't know for sure how this translates to K8s.

resoglas commented 4 years ago

Thanks @robotdan . I've dug down some deeper and found in ES logs the following messages repeating over and over received plaintext http traffic on an https channel, those "plaintext messages" are coming from FusionAuth pod, although as you can see in my configuration I use search.servers=https://{{elasticsearch_user}}:{{elasticsearch_password}}@elasticsearch-es-http.elasticsearch:9200. Could it be that the "sniffer" is ignoring https://?

ceefour commented 4 years ago

Also getting this after upgrading from 1.17.3 to 1.19.3 #857

robotdan commented 4 years ago

I think this is due to the publish addresses of Elasticsearch. Here is a good article on the issue: https://www.elastic.co/blog/elasticsearch-sniffing-best-practices-what-when-why-how

See the "But we can fix that" section.

So if that fixes the issue, we can document this much better, or perhaps look into making this new Sniffer configuration optional as it doesn't play real nice with Docker.

resoglas commented 4 years ago

So I have changed the http.publish_host to ${POD_NAME}.elasticsearch-es-default.elasticsearch.svc.cluster.local and tried connecting to the ES Cluster from another Pod using the following code:

'use strict'

const { Client } = require('@elastic/elasticsearch')
const {URL} = require('url');
const fs = require('fs')

const client = new Client({
    node: {
        url: new URL('https://user:password@elasticsearch-es-http.elasticsearch.svc.cluster.local:9200'),
    },
    ssl: {
        ca: fs.readFileSync('../app/ca.crt')
    },
    sniffOnStart: true,
    sniffInterval: 1000,
})

client.on('sniff', (err, result) => {
    console.log(result.body.nodes)
})

I have got a successful sniff response containing 3 nodes:

{
  LrIsZybiRQuDi4ab_a0QeQ: {
    name: 'elasticsearch-es-default-1',
    transport_address: '10.2.2.34:9300',
    host: '10.2.2.34',
    ip: '10.2.2.34',
    version: '7.8.1',
    build_flavor: 'default',
    build_type: 'docker',
    build_hash: '...',
    roles: [
      'data',
      'ingest',
      'master',
      'ml',
      'remote_cluster_client',
      'transform'
    ],
    attributes: {
      'ml.machine_memory': '...',
      'ml.max_open_jobs': '20',
      'xpack.installed': 'true',
      'transform.node': 'true'
    },
    http: {
      bound_address: [Array],
      publish_address: 'elasticsearch-es-default-1.elasticsearch-es-default.elasticsearch.svc.cluster.local/10.2.2.34:9200',
      max_content_length_in_bytes: ...
    }
  },
  rXYyTgCJSQmlHwbZ32257A: {
    name: 'elasticsearch-es-default-0',
    transport_address: '10.2.0.213:9300',
    host: '10.2.0.213',
    ip: '10.2.0.213',
    version: '7.8.1',
    build_flavor: 'default',
    build_type: 'docker',
    build_hash: '...',
    roles: [
      'data',
      'ingest',
      'master',
      'ml',
      'remote_cluster_client',
      'transform'
    ],
    attributes: {
      'ml.machine_memory': '...',
      'xpack.installed': 'true',
      'transform.node': 'true',
      'ml.max_open_jobs': '20'
    },
    http: {
      bound_address: [Array],
      publish_address: 'elasticsearch-es-default-0.elasticsearch-es-default.elasticsearch.svc.cluster.local/10.2.0.213:9200',
      max_content_length_in_bytes: ...
    }
  },
  zVUnh9VuRYy4mwb8RbDzDQ: {
    name: 'elasticsearch-es-default-2',
    transport_address: '10.2.0.212:9300',
    host: '10.2.0.212',
    ip: '10.2.0.212',
    version: '7.8.1',
    build_flavor: 'default',
    build_type: 'docker',
    build_hash: '...',
    roles: [
      'data',
      'ingest',
      'master',
      'ml',
      'remote_cluster_client',
      'transform'
    ],
    attributes: {
      'ml.machine_memory': '...',
      'ml.max_open_jobs': '20',
      'xpack.installed': 'true',
      'transform.node': 'true'
    },
    http: {
      bound_address: [Array],
      publish_address: 'elasticsearch-es-default-2.elasticsearch-es-default.elasticsearch.svc.cluster.local/10.2.0.212:9200',
      max_content_length_in_bytes: ...
    }
  }
}

And I am able to curl --insecure https://username:password@elasticsearch-es-default-2.elasticsearch-es-default.elasticsearch.svc.cluster.local:9200 successfuly from within the same Pod.

FusionAuth still seems to fail with the same error though... Is there something else I am missing or are there maybe more detailed logs I could find? Thanks!

P. S. Maybe this https://github.com/elastic/cloud-on-k8s/issues/3182 is somewhat related

robotdan commented 4 years ago

P. S. Maybe this elastic/cloud-on-k8s#3182 is somewhat related

Yes, thanks for the link - that looks to be the same issue for sure.

resoglas commented 4 years ago

FusionAuth v1.18.8 seems to have no problem at all sniffing ES cluster using the following config (which is basically the same as with v1.19+):

    database.url=jdbc:postgresql://{{postgresql_host}}:{{postgresql_port}}/{{postgresql_database}}
    database.username={{postgresql_user}}
    database.password={{postgresql_password}}

    fusionauth-app.search-engine-type=elasticsearch
    fusionauth-app.search-servers=https://{{elasticsearch_user}}:{{elasticsearch_password}}@elasticsearch-es-http.elasticsearch:9200

    fusionauth-app.management-port=9010
    fusionauth-app.http-port=9011
    fusionauth-app.https-port=9013
    fusionauth-app.ajp-port=9019
    fusionauth-app.memory=512M
    fusionauth-app.additional-java-args=
    fusionauth-app.cookie-same-site-policy=Lax
    fusionauth.runtime-mode=production

ElasticSearch v7.8.1 nodes immediately respond with a successful index creation/updating message of fusionauth_user.

The question being - did FusionAuth versions prior to 1.19 were not sniffing for ES cluster nodes?

resoglas commented 4 years ago

I have disabled the TLS configuration in ES cluster just to see that this is not a network error and now I am getting:

Sep 18, 2020 1:43:21 PM org.elasticsearch.client.sniff.Sniffer run
SEVERE: error while sniffing nodes
org.elasticsearch.client.ResponseException: method [GET], host [http://elasticsearch-es-default-1.elasticsearch-es-default.elasticsearch.svc.cluster.local:9200], URI [/_nodes/http?timeout=1000ms], status line [HTTP/1.1 401 Unauthorized]
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/_nodes/http?timeout=1000ms]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/_nodes/http?timeout=1000ms]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}
    at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
    at org.elasticsearch.client.sniff.ElasticsearchNodesSniffer.sniff(ElasticsearchNodesSniffer.java:105)
    at org.elasticsearch.client.sniff.Sniffer.sniff(Sniffer.java:209)
    at org.elasticsearch.client.sniff.Sniffer$Task.run(Sniffer.java:139)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
    at java.base/java.lang.Thread.run(Thread.java:832)

I now strongly believe that the Sniffer is missing appropriate configurations for Sniffer Scheme (HTTP or HTTPS) and Authentication for Username and Password.

robotdan commented 4 years ago

The question being - did FusionAuth versions prior to 1.19 were not sniffing for ES cluster nodes?

This is new in version 1.19.x.

I now strongly believe that the Sniffer is missing appropriate configurations for Sniffer Scheme (HTTP or HTTPS) and Authentication for Username and Password.

Interesting, we can take a look at this.

robotdan commented 4 years ago

The sniffer config takes the rest client which we have already configured with credentials, so it seems this should be ok. We'll have to try to recreate.

robotdan commented 3 years ago

In 1.19.8 (https://github.com/FusionAuth/fusionauth-issues/issues/893) the sniffer is off by default. This should resolve the issue for you.

Please re-open if you encounter an error with the sniffer disabled.

ceefour commented 3 years ago

Thank you @robotdan :)

robotdan commented 2 years ago

Closing, please re-open if this is still an issue.

FusionAuth / fusionauth-issues