ECK to create/manage services by Node type

basigabri commented 4 years ago

Currently ECK creates only one service inside k8s, exposed at 9200 port and having a ClusterIP.

This service named {cluster_name}-es-http, includes as endpoints all type of nodes(master,data,ingest).

The other services created one by node type(ingest,data,master) are not exposed: they don't have ClusterIP.

Kibana as an example is using exposed svc, for querying elastic. This means that it hits all endpoints including masters. We need to create services exposed with ClusterIp, by node type. For example a svc only for ingest nodes. Then hit only these pods.

sebgl commented 4 years ago

Relates https://github.com/elastic/cloud-on-k8s/issues/2161.

It's fairly easy for users to create their own services targeting any Pods they want using label selectors. We could automatically create X services, but it's hard to know in advance what subset of Pods are users interested in. Hence we pre-create only the default one, for an easy quickstart experience.

You can easily setup something like this:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.6.0
  nodeSets:
  - name: ingest-nodes
    count: 3
    config:
      node.ingest: true
      node.data: false
      node.master. false
  - name: master-nodes
    count: 3
    config:
      node.ingest: false
      node.data: false
      node.master. true
  - name: data-nodes
    count: 3
    config:
      node.ingest: false
      node.data: true
      node.master. false

---

apiVersion: v1
kind: Service
metadata:
  name: es-ingest-nodes
spec:
  selector:
    common.k8s.elastic.co/type: elasticsearch
    elasticsearch.k8s.elastic.co/cluster-name: quickstart
    elasticsearch.k8s.elastic.co/node-ingest: true
  ports:
    - name: https 
      protocol: TCP
      port: 9200
      targetPort: 9200

@basigabri do you think that's a good alternative?

basigabri commented 4 years ago

Thank you very much @sebgl for your quick response.

I am already doing this. But this is not a managed service from ECK, but a service managed by me. The request would be for ECK to be able to manage services by Node type.

Additionally, I tried to configure kibana to hit this custom ingest-nodes service without success.

{"type":"log","@timestamp":"2020-02-17T14:30:19Z","tags":["error","elasticsearch","data"],"pid":7,"message":"Request error, retrying\nGET https://elasticsearch-ingest-nodes:9200/_xpack => Client network socket disconnected before secure TLS connection was established"} {"type":"log","@timestamp":"2020-02-17T14:30:19Z","tags":["warning","legacy-plugins"],"pid":7,"path":"/usr/share/kibana/src/legacy/core_plugins/visualizations","message":"Skipping non-plugin directory at /usr/share/kibana/src/legacy/core_plugins/visualizations"} {"type":"log","@timestamp":"2020-02-17T14:30:20Z","tags":["warning","plugins","licensing"],"pid":7,"message":"License information could not be obtained from Elasticsearch for the [data] cluster. Error: Request Timeout after 30000ms"} {"type":"log","@timestamp":"2020-02-17T14:30:20Z","tags":["warning","elasticsearch","data"],"pid":7,"message":"Unable to revive connection: https://elasticsearch-ingest-nodes:9200/"} {"type":"log","@timestamp":"2020-02-17T14:30:21Z","tags":["info","plugins-system"],"pid":7,"message":"Starting [8] plugins: [security,licensing,code,timelion,features,spaces,translations,data]"} {"type":"log","@timestamp":"2020-02-17T14:30:21Z","tags":["warning","elasticsearch","data"],"pid":7,"message":"Unable to revive connection: https://elasticsearch-ingest-nodes:9200/"} {"type":"log","@timestamp":"2020-02-17T14:30:21Z","tags":["warning","elasticsearch","data"],"pid":7,"message":"No living connections"} {"type":"log","@timestamp":"2020-02-17T14:30:21Z","tags":["warning","plugins","licensing"],"pid":7,"message":"License information could not be obtained from Elasticsearch for the [data] cluster. Error: No Living connections"} {"type":"log","@timestamp":"2020-02-17T14:30:21Z","tags":["error","elasticsearch","admin"],"pid":7,"message":"Request error, retrying\nGET https://elasticsearch-ingest-nodes:9200/.kibana_task_manager => self signed certificate in certificate chain"} {"type":"log","@timestamp":"2020-02-17T14:30:21Z","tags":["error","elasticsearch","admin"],"pid":7,"message":"Request error, retrying\nGET https://elasticsearch-ingest-nodes:9200/.kibana => self signed certificate in certificate chain"} {"type":"log","@timestamp":"2020-02-17T14:30:21Z","tags":["warning","elasticsearch","admin"],"pid":7,"message":"Unable to revive connection: https://elasticsearch-ingest-nodes:9200/"}

sebgl commented 4 years ago

Request error, retrying\nGET https://elasticsearch-ingest-nodes:9200/.kibana => self signed certificate in certificate chain"

Something looks wrong in the way the certificate is setup in the Kibana configuration. Can you share how you specified this configuration?

A more general comment: ingest nodes are mostly useful to pre-process documents before their ingestion happens. It does not make much sense to route Kibana traffic to ingest nodes. Kibana is not ingesting much data and I guess you are note pre-processing Kibana data in Elasticsearch through your own ingest pipeline?

basigabri commented 4 years ago

Something looks wrong in the way the certificate is setup in the Kibana configuration. Can you share how you specified this configuration?

---
# Source: kibana/templates/kibana.yaml
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
  namespace: elastic
  labels:

    app.kubernetes.io/name: "release-name-kibana"
    app.kubernetes.io/managed-by: "Tiller"
    app.kubernetes.io/release: "release-name"
    helm.sh/chart: "kibana-7.5"
spec:
  version: 7.5.0
  count: 1
  config:
    elasticsearch.hosts:
            - https://elasticsearch-ingest-nodes:9200

    elasticsearch.username: elastic
    #elasticsearch.ssl.certificateAuthorities: /mnt/usr/ca.crt
  secureSettings:
    - secretName: elastic-kibana-kibana-user
  podTemplate:
    spec:

      containers:
        - name: kibana
          volumeMounts:
            - name: elasticsearch-certs
              mountPath: /mnt/usr
              readOnly: true
      volumes:
        - name: elasticsearch-certs
          secret:
            secretName: elasticsearch-es-http-certs-internal

  http:
    service:
      spec:
        type: ClusterIP
    tls:

      selfSignedCertificate:
        subjectAltNames:
        - dns: kibana.mydns.net

  podTemplate:
    spec:
      containers:
      - name: kibana
        resources:
          limits:
            memory: 1Gi

The secret volume is not mounted in the path mountPath: /mnt/usr. It seems like the CA is not consumed correctly.

`10:53 $ k describe pod kibana-kb-7656485d54-5tdtt
Name:           kibana-kb-7656485d54-5tdtt
Namespace:      elastic
Priority:       0
Node:           aks-agentpool-93582542-vmss000000/10.240.0.4
Start Time:     Mon, 17 Feb 2020 19:08:05 +0200
Labels:         common.k8s.elastic.co/type=kibana
                kibana.k8s.elastic.co/config-checksum=e3ac7c53717f3c34a526e1c961af3d3a3f1c3715422cc0a326b11240
                kibana.k8s.elastic.co/name=kibana
                kibana.k8s.elastic.co/version=7.5.0
                pod-template-hash=7656485d54
Annotations:    <none>
Status:         Running
IP:             10.244.0.121
IPs:            <none>
Controlled By:  ReplicaSet/kibana-kb-7656485d54
Containers:
  kibana:
    Container ID:   docker://f1c278635efae5bc6854863c1b7a146c37bd2294245085111aa46faa62dbe4fa
    Image:          docker.elastic.co/kibana/kibana:7.5.0
    Image ID:       docker-pullable://docker.elastic.co/kibana/kibana@sha256:0dfe7c796a7702556cd7e9bb7e2d56be335ec22260ce569038b3aaf663afa90b
    Port:           5601/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 17 Feb 2020 19:08:07 +0200
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  1Gi
    Requests:
      memory:     1Gi
    Readiness:    http-get https://:5601/login delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /mnt/elastic-internal/http-certs from elastic-internal-http-certificates (ro)
      /usr/share/kibana/config from config (ro)
      /usr/share/kibana/data from kibana-data (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kibana-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kibana-kb-config
    Optional:    false
  elastic-internal-http-certificates:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kibana-kb-http-certs-internal
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                     From                                        Message
  ----     ------     ----                    ----                                        -------
  Warning  Unhealthy  4m53s (x5638 over 15h)  kubelet, aks-agentpool-93582542-vmss000000  Readiness probe failed: HTTP probe failed with statuscode: 503

A more general comment: ingest nodes are mostly useful to pre-process documents before their ingestion happens. It does not make much sense to route Kibana traffic to ingest nodes. Kibana is not ingesting much data and I guess you are note pre-processing Kibana data in Elasticsearch through your own ingest pipeline?

There was a confusion, I want Kibana to hit only what in the past was called Client nodes (in the past), which now are the Coordinating Nodes ...

anyasabo commented 4 years ago

Does the error persist if #elasticsearch.ssl.certificateAuthorities: /mnt/usr/ca.crt is uncommented out?

basigabri commented 4 years ago

Yes, this is a copy paste error. I don't have it commented out. It's like this, elasticsearch.ssl.certificateAuthorities: /mnt/usr/ca.crt

same error though, as pasted above.

"pid":7,"message":"Request error, retrying\nGET https://elasticsearch-ingest-nodes:9200/.kibana => self signed certificate in certificate chain"} {"type":"log","@timestamp":"2020-02-17T14:30:21Z","tags":["warning","elasticsearch","admin"],"pid":7,"message":"Unable to revive connection: https://elasticsearch-ingest-nodes:9200/"}

The main purpose of this Request though, is for the oporerator to be able to manage services by NodeType.

Thanks a lot !

anyasabo commented 4 years ago

@basigabri as @sebgl mentioned it's pretty similar to existing issue https://github.com/elastic/cloud-on-k8s/issues/2161 (which is to at least document how to do this, if not manage the services in the operator). I think it makes sense to close this ticket and use that other issue to discuss extra services. We can continue trying to troubleshoot your specific configuration here though if you'd like?

steve21168 commented 4 years ago

There should really be an easy and reliable way to control where the data is sent. Having the default go to all nodes including master nodes seems like it's setting less attentive users up for failure.

Creating an additional service to go to ingest/client/data nodes is relatively simple.

I see the issue being it's difficult to use the ECK Kibana/APM server.

Unless I'm missing something you need create your own password secret that implements a key with elasticsearch.password because the existing kibana secret has a different key.

And without elasticsearchRef I would assume one of the config changes (to certs / users etc.) triggers that would call a rolling restart on the apm server or kibana would never get triggered.

I would propose that you be able to keep elasticsearchRef and set config.elasticsearch.hosts. As of right now it would merge your host with the other and you'd have 2 hosts set. If this simply didn't merge everything would work fine. ECK would fully manage your APM server / Kibana except for you overwriting where it points.

Update: My above proposal works on APM server but not on Kibana... not sure if it's a bug or just intended behavior.

pebrc commented 4 years ago

I wonder if we could try to be smart and use the knowledge ECK has about the current topology of the cluster to create two different services:

one for reads targeting coordinator nodes if present, or else data nodes
one for writes targeting coordinator nodes if present or else ingest or data nodes

Creating any other dedicated services (I cannot think of a good use case for those right now) would be the responsibility of the user.

elastic / cloud-on-k8s

ECK to create/manage services by Node type #2572