elastic / cloud-on-k8s

Elastic Cloud on Kubernetes
Other
54 stars 707 forks source link

[Istio 1.8.1] Kibana can't connect to ElasticSearch #4027

Closed XBeg9 closed 5 months ago

XBeg9 commented 3 years ago

Bug Report

What did you do?

trying to run Kibana + ElasticSearch on Azure Private AKS

What did you expect to see?

Kibana should connect to Elasticsearch

What did you see instead? Under which circumstances?

Kibana can't connect to Elasticsearch

{"type":"log","@timestamp":"2020-12-09T12:54:23Z","tags":["error","elasticsearch","data"],"pid":6,"message":"[ConnectionError]: getaddrinfo ENOTFOUND elastic-es-http.c-ns.svc elastic-es-http.c-ns.svc:9200"}

Environment

Istio 1.8.1

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-12T01:09:16Z", GoVersion:"go1.15.4", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.10", GitCommit:"62876fc6d93e891aa7fbe19771e6a6c03773b0f7", GitTreeState:"clean", BuildDate:"2020-10-16T20:43:34Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Annotations set for elasticsearch pods:

'traffic.sidecar.istio.io/includeInboundPorts': '*', 'traffic.sidecar.istio.io/excludeOutboundPorts': '9300,443', 'traffic.sidecar.istio.io/excludeInboundPorts': '9300'

mTLS has been enabled for Istio.

charith-elastic commented 3 years ago

I don't think this is an Istio issue. It looks like a DNS problem within the cluster: getaddrinfo ENOTFOUND elastic-es-http.c-ns.svc

Did you try kubectl execing in to the Kibana pod and accessing the Elasticsearch service via curl? That could give you more information about the problem.

XBeg9 commented 3 years ago

@charith-elastic yes, sorry, totally forgot to give input on that. I am able to get curl working inside kibana pod.

XBeg9 commented 3 years ago
Lens - The Kubernetes IDE 2020-12-09 16-06-46
XBeg9 commented 3 years ago

kubectl describe peerauthentication/default -n istio-system

Namespace:    istio-system
Labels:       app.kubernetes.io/managed-by=pulumi
Annotations:  <none>
API Version:  security.istio.io/v1beta1
Kind:         PeerAuthentication
Metadata:
  Self Link:         /apis/security.istio.io/v1beta1/namespaces/istio-system/peerauthentications/default
Spec:
  Mtls:
    Mode:  STRICT
Events:    <none>

I am seeing this error inside proxy sidecar (kibana)

2020-12-09T14:00:20.683725Z info    Envoy proxy is ready
2020-12-09T14:00:30.620939Z error   Request to probe app failed: Get "http://localhost:5601/login": dial tcp 127.0.0.1:5601: connect: connection refused, original URL path = /app-health/kibana/readyzapp URL path = /login
XBeg9 commented 3 years ago

OK. now I know what's the problem. I've enabled this: https://istio.io/latest/blog/2020/dns-proxy/ on my mesh-cluster and it does perfectly, except for kibana.

// meshConfig: {
//   defaultConfig: {
//     proxyMetadata: {
//       ISTIO_META_DNS_CAPTURE: 'true'
//     }
//   }
// }

after disabling ISTIO_META_DNS_CAPTURE, basically, the sidecar is not capturing the DNS, kibana works without any issues. Any ideas?

charith-elastic commented 3 years ago

Based on the Envoy log line you pasted, my hunch is that perhaps the healthcheck is failing for the Kibana pod and the required config is not getting pushed to the sidecar because of that. Try adding the following lines to the Kibana manifest and see if that helps.

  podTemplate:
    metadata:
      annotations:
        sidecar.istio.io/rewriteAppHTTPProbers: "true"
XBeg9 commented 3 years ago

@charith-elastic it's there already and doesn't make any changes :( kibana can't get addr (dns) of elasticsearch

charith-elastic commented 3 years ago

I just noticed that Kibana is trying to connect to elastic-es-http.c-ns.svc:9200 whereas the curl you executed is for elastic-es-http.cityu-ns.svc:9200. That's a different namespace, isn't it? (c-ns vs. cityu-ns)

I tried to recreate your problem on GKE and had no luck. I was wondering whether it was due to environmental differences between AKS and GKE but this namespace mix-up seems like a more promising avenue to investigate.

XBeg9 commented 3 years ago

@charith-elastic sorry, that was me dumping logs from 2 different tries :) the NS is correct. Have you seen my message? https://github.com/elastic/cloud-on-k8s/issues/4027#issuecomment-741815655

Basically, if I remove ISTIO Sidecar DNS Capture it does work well without any issues.

charith-elastic commented 3 years ago

I have DNS capture enabled as well. Kibana still works.

What is the version of Kibana you're trying to deploy?

XBeg9 commented 3 years ago

@charith-elastic both versions are same (7.10.0)

export const version = '7.10.0';

(I am using pulumi to deploy the app, so sorry for typescript syntax):

Elasticsearch

export const elasticsearch = new eck.elasticsearch.v1.Elasticsearch(
  "elastic",
  {
    metadata: { name: "elastic", namespace: namespace.metadata.name },
    spec: {
      version,
      nodeSets: [
        {
          name: `default`,
          count: 1,
          podTemplate: {
            metadata: {
              annotations: {
                "traffic.sidecar.istio.io/includeInboundPorts": "*",
                "traffic.sidecar.istio.io/excludeOutboundPorts": "9300,443",
                "traffic.sidecar.istio.io/excludeInboundPorts": "9300"
              }
            },
            spec: {
              initContainers: [
                {
                  name: "sysctl",
                  securityContext: {
                    privileged: true
                  },
                  command: ["sh", "-c", "sysctl -w vm.max_map_count=262144"]
                },
                {
                  name: "install-plugins",
                  command: [
                    "sh",
                    "-c",
                    "bin/elasticsearch-plugin install --batch repository-azure"
                  ]
                }
              ],
              containers: [
                {
                  name: "elasticsearch",
                  readinessProbe: {
                    exec: {
                      command: [
                        "bash",
                        "-c",
                        "/mnt/elastic-internal/scripts/readiness-probe-script.sh"
                      ]
                    },
                    failureThreshold: 3,
                    initialDelaySeconds: 10,
                    periodSeconds: 12,
                    successThreshold: 1,
                    timeoutSeconds: 12
                  },
                  env: [
                    {
                      name: "ES_JAVA_OPTS",
                      value: "-Xms512m -Xmx512m"
                    },
                    {
                      name: "READINESS_PROBE_TIMEOUT",
                      value: "10"
                    }
                  ],
                  resources: {
                    requests: {
                      memory: "512Mi",
                      cpu: "250m"
                    },
                    limits: {
                      memory: "1Gi"
                    }
                  }
                }
              ]
            }
          }
        }
      ],
      http: {
        tls: { selfSignedCertificate: { disabled: true } },
        service: {
          spec: {
            type: "ClusterIP"
          }
        }
      }
    }
  },
  { parent: namespace, dependsOn: [namespace], provider }
);

Kibana

export const kibana = new eck.kibana.v1.Kibana(
  'kibana',
  {
    metadata: { name: 'kibana', namespace: namespace.metadata.name },
    spec: {
      version: es.version,
      count: 1,
      elasticsearchRef: {
        name: es.elasticsearch.metadata.apply(m => m!.name!.toString()),
        namespace: namespace.metadata.name
      },
      config: {
        'logging.verbose': true
      },
      podTemplate: {
        metadata: {
          annotations: {
            'sidecar.istio.io/rewriteAppHTTPProbers': 'true'
          }
        },
        spec: {
          containers: [
            {
              name: 'kibana',
              resources: {
                limits: { memory: `512Mi`, cpu: 0.25 * 2 },
                requests: { memory: `256Mi`, cpu: 0.25 }
              },
              env: [
                {
                  name: 'ES_JAVA_OPTS',
                  value: `-Xms256m -Xmx256m`
                }
              ]
            }
          ]
        }
      },
      http: {
        tls: { selfSignedCertificate: { disabled: true } },
        service: {
          spec: { type: 'ClusterIP' }
        }
      }
    }
  },
  { parent: namespace, dependsOn: [es.elasticsearch], provider }
);
charith-elastic commented 3 years ago

Sorry about the delay getting back to you. Have you managed to get things working? I still can't reproduce the problem on my end.

cdmurph32 commented 3 years ago

I ran into the same issue and can confirm that disabling the DNS proxy fixes it.

naemono commented 2 years ago

I cannot replicate this issue when sidecar.istio.io/rewriteAppHTTPProbers: 'true' is not used.

AKS/Kubernetes version:

Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.6", GitCommit:"ece9ecf2f9aecbd86d3eba31f0be62e4b6353a5a", GitTreeState:"clean", BuildDate:"2022-07-28T23:33:17Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

Istio Version

❯ istioctl version
client version: 1.15.0
control plane version: 1.15.0
data plane version: 1.15.0 (7 proxies)

Istio Configuration

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    accessLogFile: /dev/stdout
    defaultConfig:
      proxyMetadata:
        # Enable basic DNS proxying
        ISTIO_META_DNS_CAPTURE: "true"
        # Enable automatic address allocation, optional
        ISTIO_META_DNS_AUTO_ALLOCATE: "true"
    extensionProviders:
      - name: otel
        envoyOtelAls:
          service: opentelemetry-collector.istio-system.svc.cluster.local
          port: 4317
  components:
    egressGateways:
      - name: istio-egressgateway
        enabled: true
        k8s:
          resources:
            requests:
              cpu: 10m
              memory: 40Mi
    ingressGateways:
      - name: istio-ingressgateway
        enabled: true
        k8s:
          resources:
            requests:
              cpu: 10m
              memory: 40Mi
          service:
            ports:
              ## You can add custom gateway ports in user values overrides, but it must include those ports since helm replaces.
              # Note that AWS ELB will by default perform health checks on the first port
              # on this list. Setting this to the health check port will ensure that health
              # checks always work. https://github.com/istio/istio/issues/12503
              - port: 15021
                targetPort: 15021
                name: status-port
              - port: 80
                targetPort: 8080
                name: http2
              - port: 443
                targetPort: 8443
                name: https
              - port: 31400
                targetPort: 31400
                name: tcp
                # This is the port where sni routing happens
              - port: 15443
                targetPort: 15443
                name: tls
    pilot:
      k8s:
        env:
          - name: PILOT_TRACE_SAMPLING
            value: "100"
        resources:
          requests:
            cpu: 10m
            memory: 100Mi
  values:
    global:
      proxy:
        resources:
          requests:
            cpu: 10m
            memory: 40Mi
    pilot:
      autoscaleEnabled: false
    gateways:
      istio-egressgateway:
        autoscaleEnabled: false
      istio-ingressgateway:
        autoscaleEnabled: false

Istio Peer Authentication Setup

kind: PeerAuthentication
apiVersion: security.istio.io/v1beta1
metadata:
  name: istio-enabled
  namespace: istio-enabled
spec:
  mtls:
    mode: STRICT

Resources (tls disabled as mtls is enabled in istio-enabled namespace)

---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: testing
  namespace: istio-enabled
  labels:
    app: elasticsearch
    version: 0.0.1
spec:
  version: 8.4.0
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
  - name: masters
    count: 3
    podTemplate:
      metadata:
        annotations:
          traffic.sidecar.istio.io/includeInboundPorts: "*"
          traffic.sidecar.istio.io/excludeOutboundPorts: "9300" 
          traffic.sidecar.istio.io/excludeInboundPorts: "9300"
        labels:
          app: elasticsearch
          version: 0.0.1
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: default
    config:
      node.roles: ["master", "data", "ingest"]
      node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: testing
  namespace: istio-enabled
  labels:
    app: kibana
    version: 0.0.1
spec:
  podTemplate:
    metadata:
      labels:
        app: kibana
        version: 0.0.1
  version: 8.4.0
  count: 1
  elasticsearchRef:
    name: "testing"
  http:
    tls:
      selfSignedCertificate:
        disabled: true
---

Pods all become ready

❯ kc get pods -n istio-enabled
NAME                          READY   STATUS    RESTARTS   AGE
testing-es-masters-0          2/2     Running   0          13m
testing-es-masters-1          2/2     Running   0          13m
testing-es-masters-2          2/2     Running   0          13m
testing-kb-67cfc688f4-gwnlq   2/2     Running   0          13m
pebrc commented 5 months ago

Closing this as it seems to be related to DNS capture in istio.