bitwalker / libcluster

Automatic cluster formation/healing for Elixir applications
MIT License
1.93k stars 186 forks source link

[error] ** System NOT running to use fully qualified hostnames ** Kubernetes DNSSRV Strategy #121

Open paltaa opened 4 years ago

paltaa commented 4 years ago

So I've followed this tutorial:

https://tech.xing.com/creating-an-erlang-elixir-cluster-on-kubernetes-d53ef89758f6

On the logs getting this errors:

14:02:03.126 [error] ** System NOT running to use fully qualified hostnames ** ** Hostname 192-168-4-107.ueuropea-excalibur-headless-service.default.svc.cluster.local is illegal ** 14:02:03.325 [warn] [libcluster:k8s_excalibur] unable to connect to :"excalibur@192-168-4-107.ueuropea-excalibur-headless-service.default.svc.cluster.local"

`root@ueuropea-excalibur-74df5dddbc-kjfql:/excalibur# nslookup -q=srv ueuropea-excalibur-headless-service.default.svc.cluster.local Server: 10.100.0.10 Address: 10.100.0.10#53

Non-authoritative answer: ueuropea-excalibur-headless-service.default.svc.cluster.local service = 0 33 4000 192-168-4-107.ueuropea-excalibur-headless-service.default.svc.cluster.local. ueuropea-excalibur-headless-service.default.svc.cluster.local service = 0 33 4000 192-168-53-92.ueuropea-excalibur-headless-service.default.svc.cluster.local. ueuropea-excalibur-headless-service.default.svc.cluster.local service = 0 33 4000 192-168-65-161.ueuropea-excalibur-headless-service.default.svc.cluster.local.

Authoritative answers can be found from: 192-168-65-161.ueuropea-excalibur-headless-service.default.svc.cluster.local internet address = 192.168.65.161 192-168-53-92.ueuropea-excalibur-headless-service.default.svc.cluster.local internet address = 192.168.53.92 192-168-4-107.ueuropea-excalibur-headless-service.default.svc.cluster.local internet address = 192.168.4.107 `

It seems its missing a . at the end of the returned DNS right?

Is there something wrong with the tutorial? something could be missing?

Entrypoint:

`mix release

elixir -S mix phx.server --name excalibur@${MY_POD_IP} --cookie "secret"

_build/prod-kubernetes/rel/excalibur/bin/excalibur start`

Deployment:

apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: io.kompose.service: excalibur name: ueuropea-excalibur spec: progressDeadlineSeconds: 90 replicas: 3 strategy: type: Recreate template: metadata: labels: io.kompose.service: excalibur spec: containers:

Headless service:

apiVersion: v1 kind: Service metadata: labels: io.kompose.service: excalibur name: ueuropea-excalibur-headless-service spec: type: ClusterIP clusterIP: None ports:

Let me know if any more information is needed

bryanhuntesl commented 4 years ago

Hi @paltaa - I'm the author of the Kubernetes DNSSRV Strategy - and I've just discovered the same problem in some training material I'm supposed to be teaching tomorrow πŸ˜‚!!!!

This is highly annoying - looks like an arbitrary change in statefulset DNS names - I'll investigate - if only for my own selfish reasons and report back.

Thanks for the bug report

bryanhuntesl commented 4 years ago

@bitwalker - assign this to me if you like

bryanhuntesl commented 4 years ago

Reason for breakage:

Google Cloud have stopped using CoreDNS as DNS resolver as of GKE 1.1.3 - they are instead using kube-dns which doesn't provide service discovery via SRV record resolution. (obviously they want everyone using k8s API for everything).

Lame.

Release notes (1.1.3) - they switched : https://cloud.google.com/kubernetes-engine/docs/release-notes#new_features_17

Someone trying to debug the issue : https://github.com/kubernetes/kubernetes/issues/85759

Stackoverflow thread : https://stackoverflow.com/questions/55122234/installing-coredns-on-gke

The kubernetes/docker ecosystem is just like that - stuff arbitrarily breaks all the time - recommend you try using Paul's strategy/kubernetes instead .

I'm away but will set a reminder to create a docs PR - maybe changing the name to strategy/k8s-coredns-srv or something that makes it clear you need coredns.

Man - that Google - always causing problems ! Sorry !

paltaa commented 4 years ago

Thanks a lot for the reply!! Been trying to make this work for a couple of days, followed about 4 different tutorials hahaha, glad to know I helped by posting this issue! let me know if you manage to fix this, good luck tomorrow

paltaa commented 4 years ago

@bryanhuntesl Also, this is currently happening on AWS EKS, which still uses coreDNS, by making some tweaks the illegal hostname error and system not running fqdns too, but still it wont connect

mrchypark commented 4 years ago

I have the same problem.

My environment is aks (azure kube) v1.15.7. I had the same problem using kubernetnes.DNS strategy.

paltaa commented 4 years ago

@mrchypark Hey, what worked for me is Elixir.Cluster.Strategy.Kubernetes

My example: topologies = [ k8s_excalibur: [ strategy: Elixir.Cluster.Strategy.Kubernetes, config: [ service: System.get_env("ERLANG_CLUSTER_SERVICE_NAME"), application_name: "excalibur", kubernetes_node_basename: "excalibur", kubernetes_namespace: "default", kubernetes_selector: "io.kompose.service=excalibur" ] ] ]

mrchypark commented 4 years ago

@paltaa Thank you for your reply! I have question.

What is service means? It's service resource on kubernetes?

paltaa commented 4 years ago

Yes, the service for kubernetes deployments

mrchypark commented 4 years ago

@paltaa Thank you! I'll try this :)

mrchypark commented 4 years ago

@paltaa I have more question T.T

what is your node name now?

mine is like hydra@hydraapp-799457d75f-948qc.

I have no error but empty node list too.

paltaa commented 4 years ago

You need to use distillery, setup a pre hook before the erlang VM is up and setup the ENV VAR ERLANG_NODE with its local ip

mrchypark commented 4 years ago

local ip means pod ip?

paltaa commented 4 years ago

For example.

Deployment part:

        - name: MY_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP

Then the pre hook:

!/bin/sh

echo 'Setting ERLANG_NAME...' export ERLANG_NAME=$MY_POD_IP echo $ERLANG_NAME export ERLANG_COOKIE=**** echo $ERLANG_COOKIE

vm.args:

-name excalibur@${ERLANG_NAME}

-setcookie ${ERLANG_COOKIE}

mrchypark commented 4 years ago

I maybe pass the set cookie.

I'll try this.

paltaa commented 4 years ago

And yes, by local ip I ment pod IP

seivan commented 4 years ago

I recommend using KubernetesDNS but it requires a headless service.

bitwalker commented 4 years ago

@bryanhuntesl I've assigned this to you as requested, but my question to everyone participating in this thread is whether or not the KubernetesDNS strategy is suitable as a replacement, allowing us to deprecate the DNSSRV strategy if there are issues that make it difficult to use.

I'm fine with not deprecating it, but someone from the community has to speak up and take the lead on updating the strategy as appropriate so that it works out of the box. I'm also fine with deprecating it here in libcluster, but handing off the implementation to someone to maintain on their own as a third-party plugin, just let me know. Suffice to say, I won't have time to maintain it myself in the immediate future, and I don't like to keep things around that are broken either, so I'll have to make the call soon.

seivan commented 4 years ago

Just to clarify, is this only an issue where CoreDNS isn't available? Afaik CoreDNS is now on EKS since .12, not sure about GKE.

bryanhuntesl commented 4 years ago

Just to clarify, is this only an issue where CoreDNS isn't available? Afaik CoreDNS is now on EKS since .12, not sure about GKE.

This is an issue because Google Kubernetes Engine removed CoreDNS - the

Just to clarify, is this only an issue where CoreDNS isn't available? Afaik CoreDNS is now on EKS since .12, not sure about GKE.

https://github.com/bitwalker/libcluster/issues/121#issuecomment-594207646

seivan commented 4 years ago

@bryanhuntesl Right, so when I implemented the first k8 service discovery strategy (KubernetesDNS) I assumed it was just a "standardized" implementation based on some K8 spec from the first paragraph https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#services

Which makes it extra scary when it's documented under K8 but is implementation specific!

However I wonder if this is related https://github.com/kubernetes/dns/issues/339#issuecomment-594798682

Which means it's an issue when using hostnames to pods, and not the actual address. So that's probably why endpoint_pod_names doesn't exist on kube-dns but just guessing here.

I don't have access to GKE/kube-dns, could you test to see if KubernetesDNS works since it just returns addresses.

That's an alternative (and one I use) unless IPs aren't good enough or you're using shared hostnames on the pods, which isn't necessary if your intention is to just setup an Erlang cluster.

@bitwalker I don't suggest removing it, but renaming it to something like CoreDNSSRV or whatever @bryanhuntesl suggested, but definitely not merging them now.

But in general, if KubernetesDNS works on kube-dns, I really suggest that should be the default or recommend approach if the intention is to make an Erlang cluster. There is no need to complicate it with SRV lookup and shared hostnames if the intention is just to join nodes together into a cluster.

bryanhuntesl commented 4 years ago

I don't have access to GKE/kube-dns, could you test to see if KubernetesDNS works since it just returns addresses.

@seivan sorry I just don't have bandwidth right now, I'm assigned to client work.

@bitwalker I don't suggest removing it, but renaming it to something like CoreDNSSRV or whatever @bryanhuntesl suggested, but definitely not merging them now.

@bitwalker if renaming to Cluster.Strategy.CoreDNSSRV is acceptable - I can create a PR in the evening and update the documentation.

seivan commented 4 years ago

@bryanhuntesl No worries, thanks for all the input. I recall testing locally with Minkube a year ago or so and that in turn runs Kube-dns, unless anything has changed, I'm thinking it's fine actually.

Besides if it didn't work with Kube-dns then it really doesn't serve a purpose to have headless services at all, not to mention it would be a pretty blatan flaw on the k8 docs.

bitwalker commented 4 years ago

@bryanhuntesl Sorry for the delay, haven't had a chance to get back to this in a while. I'm good with renaming the strategy, and documenting the caveats. I'll have to bump the major version for the release, but that's fine, we're probably due for that.

michaelst commented 4 years ago

I am getting this same error using this strategy. I have tried both :ip and :dns modes ** System NOT running to use fully qualified hostnames ** ** Hostname 10.32.9.22 is illegal **


config :libcluster,
  topologies: [
    k8s: [
      strategy: Elixir.Cluster.Strategy.Kubernetes,
      config: [
        mode: :ip,
        kubernetes_node_basename: "server",
        kubernetes_selector: "app.kubernetes.io/instance=server",
        kubernetes_namespace: "backend"
      ]
    ]
  ]
darwin67 commented 3 years ago

I'm getting similar errors as @michaelst

09:08:47.379 [warn] [libcluster:k8s] unable to connect to :"community_service@10.0.1.110"
09:08:47.379 [error] ** System NOT running to use fully qualified hostnames **
** Hostname 10.0.2.165 is illegal **

Elixir version: 1.11.1

config :libcluster,
  topologies: [
    k8s: [
      strategy: Cluster.Strategy.Kubernetes,
      config: [
        kubernetes_node_basename: "community_service",
        kubernetes_selector: "app=community-service,role=api",
        kubernetes_namespace: System.get_env("NAMESPACE", "community-service"),
        polling_interval: 15_000
      ]
    ]
  ]

~All tutorials I find uses distillery. Is distillery some kind of implicit hard requirement to use libcluster? I'm currently just using the built-in mix release and following the documents doesn't get a working example regardless of which kubernetes strategy I use. They all emit the same errors.~

~What am I missing here?~

Issue resolved

Found out that I can get the templates of env.sh.eex and others with mix release.init and after tweaking the files, it seems to be working. Not sure how to verify though.

sleipnir commented 3 years ago

I am facing a similar problem. I'm using Kind to test my service and I get the error below

2021-04-02 01:52:34.708 [massa_proxy@massa-proxy-c9885df8-ffbhz]:[pid=<0.2378.0> ]:[error]:** System NOT running to use fully qualified hostnames **
** Hostname 10.244.0.27 is illegal **

kind version 0.7.0

Headless Service:

apiVersion: v1
kind: Service
metadata:
  name: proxy-headless-svc
  namespace: default
spec:
  selector:
    app: massa-proxy
  clusterIP: None

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: massa-proxy
  name: massa-proxy
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: massa-proxy
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
        prometheus.io/port: "9001"
        prometheus.io/scrape: "true"
      labels:
        app: massa-proxy
    spec:
      containers:
      - name: massa-proxy
        image: docker.io/eigr/massa-proxy:0.1.0
        ports:
        - containerPort: 9001
        imagePullPolicy: Always
        env:
        - name: PROXY_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 9001
            scheme: HTTP
          initialDelaySeconds: 300
          periodSeconds: 3600
          successThreshold: 1
          timeoutSeconds: 1200
        resources:
          limits:
            memory: 1024Mi
          requests:
            memory: 70Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        envFrom:
        - configMapRef:
            name: proxy-cm
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

My topology:

[proxy: [strategy: Cluster.Strategy.Kubernetes.DNS, config: [service: "proxy-headless-svc", application_name: "massa-proxy", polling_interval: 3000]]]

Dockerfile:

FROM elixir:1.10-alpine as builder

ENV MIX_ENV=prod

RUN mkdir -p /app/massa_proxy
WORKDIR /app/massa_proxy

RUN apk add --no-cache --update git build-base zstd

COPY . /app/massa_proxy

RUN rm -rf /app/massa_proxy/apps/massa_proxy/mix.exs \
    && mv /app/massa_proxy/apps/massa_proxy/mix-bakeware.exs \
          /app/massa_proxy/apps/massa_proxy/mix.exs

RUN mix local.rebar --force \
    && mix local.hex --force \
    && mix deps.get 

RUN echo "-name massa_proxy@${PROXY_POD_IP}" >> /app/massa_proxy/apps/massa_proxy/rel/vm.args.eex \
      && echo "-setcookie ${NODE_COOKIE}" >> /app/massa_proxy/apps/massa_proxy/rel/vm.args.eex

RUN rm -fr /app/massa_proxy/_build \
    && cd /app/massa_proxy/apps/massa_proxy \
    && mix deps.get \
    && mix release.init \
    && mix release

# ---- Application Stage ----
FROM alpine:3
RUN apk add --no-cache --update bash openssl

WORKDIR /home/app
COPY --from=builder /app/massa_proxy/_build/prod/rel/bakeware/ .
COPY apps/massa_proxy/priv /home/app/

RUN adduser app --disabled-password --home app

RUN mkdir -p /home/app/cache
RUN chown -R app: .

USER app

ENV MIX_ENV=prod
ENV REPLACE_OS_VARS=true
ENV BAKEWARE_CACHE=/home/app/cache
ENV PROXY_TEMPLATES_PATH=/home/app/templates

ENTRYPOINT ["./massa_proxy"]

I am already losing hope that I can resolve this. Does anyone know what it could be?

bryanhuntesl commented 3 years ago

@adriano

RUN echo "-name massa_proxy@${PROXY_POD_IP}" >> /app/massa_proxy/apps/massa_proxy/rel/vm.args.eex \

have you tried ?

RUN echo "-sname massa_proxy@${PROXY_POD_IP}" >> /app/massa_proxy/apps/massa_proxy/rel/vm.args.eex \

An IP address is not a 'name' as far as Erlang is concerned, a 'name' is a FQDN (fully qualified domain name) such as node-0...cluster.local.

On Fri, 2 Apr 2021 at 03:09, Adriano Santos @.***> wrote:

I am facing a similar problem. I'm using Kind to test my service and I get the error below

2021-04-02 01:52:34.708 @.*:[pid=<0.2378.0> ]:[error]: System NOT running to use fully qualified hostnames Hostname 10.244.0.27 is illegal **

kind version 0.7.0

Headless Service:

apiVersion: v1 kind: Service metadata: name: proxy-headless-svc namespace: default spec: selector: app: massa-proxy clusterIP: None

Deployment:

apiVersion: apps/v1 kind: Deployment metadata: labels: app: massa-proxy name: massa-proxy spec: progressDeadlineSeconds: 600 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: app: massa-proxy strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: annotations: prometheus.io/port: "9001" prometheus.io/scrape: "true" labels: app: massa-proxy spec: containers:

  • name: massa-proxy image: docker.io/eigr/massa-proxy:0.1.0 ports:
    • containerPort: 9001 imagePullPolicy: Always env:
    • name: PROXY_POD_IP valueFrom: fieldRef: fieldPath: status.podIP livenessProbe: failureThreshold: 3 httpGet: path: /health port: 9001 scheme: HTTP initialDelaySeconds: 300 periodSeconds: 3600 successThreshold: 1 timeoutSeconds: 1200 resources: limits: memory: 1024Mi requests: memory: 70Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File envFrom:
    • configMapRef: name: proxy-cm dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30

My topology:

[proxy: [strategy: Cluster.Strategy.Kubernetes.DNS, config: [service: "proxy-headless-svc", application_name: "massa-proxy", polling_interval: 3000]]]

Dockerfile:

FROM elixir:1.10-alpine as builder

ENV MIX_ENV=prod

RUN mkdir -p /app/massa_proxy WORKDIR /app/massa_proxy

RUN apk add --no-cache --update git build-base zstd

COPY . /app/massa_proxy

RUN rm -rf /app/massa_proxy/apps/massa_proxy/mix.exs \ && mv /app/massa_proxy/apps/massa_proxy/mix-bakeware.exs \ /app/massa_proxy/apps/massa_proxy/mix.exs

RUN mix local.rebar --force \ && mix local.hex --force \ && mix deps.get

RUN echo "-name massa_proxy@${PROXY_POD_IP}" >> /app/massa_proxy/apps/massa_proxy/rel/vm.args.eex \ && echo "-setcookie ${NODE_COOKIE}" >> /app/massa_proxy/apps/massa_proxy/rel/vm.args.eex

RUN rm -fr /app/massa_proxy/_build \ && cd /app/massa_proxy/apps/massa_proxy \ && mix deps.get \ && mix release.init \ && mix release

---- Application Stage ----

FROM alpine:3 RUN apk add --no-cache --update bash openssl

WORKDIR /home/app COPY --from=builder /app/massa_proxy/_build/prod/rel/bakeware/ . COPY apps/massa_proxy/priv /home/app/

RUN adduser app --disabled-password --home app

RUN mkdir -p /home/app/cache RUN chown -R app: .

USER app

ENV MIX_ENV=prod ENV REPLACE_OS_VARS=true ENV BAKEWARE_CACHE=/home/app/cache ENV PROXY_TEMPLATES_PATH=/home/app/templates

ENTRYPOINT ["./massa_proxy"]

I am already losing hope that I can resolve this. Does anyone know what it could be?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bitwalker/libcluster/issues/121#issuecomment-812277089, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHUCR5SFBBQ5D3AKAJCAGSLTGURMFANCNFSM4LALPKAA .

-- ............................................... (PGP) 0x87E3B94D7B2BEEEF (Keybase) @.*** (Github) bryanhuntesl ...............................................

-- Our upcoming conferences:

Code BEAM V Europe: https://www2.codesync.global/code-sync/code-beam-sto-2021 19-21 May 2021 ElixirConf EU: https://www2.elixirconf.eu/elixir-conf-2021/es 8-10 September 2021 Code Beam SF: https://www2.codesync.global/code-beam-sf-2021/es 4-5 November 2021

Erlang Solutions cares about your data and privacy; please find all details about the basis for communicating with you and the way we process your data in our Privacy Policy https://www.erlang-solutions.com/privacy-policy.html. You can update your email preferences or opt-out from receiving Marketing emails here https://www2.erlang-solutions.com/email-preference?epc_hash=JtO6C7Q2rJwCdZxBx3Ad8jI2D4TJum7XcUWcgfjZ8YY.

sleipnir commented 3 years ago

Hi @bryanhuntesl, thanks for the quick response. Yes I tried to use -sname to no avail too. In my logs I print Node.self () and see that the name is different from the one configured. Look:

2021-04-02 12:50:53.342 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1759.0> ]:[info]: Starting HTTP Server on port 9001
2021-04-02 12:50:53.342 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1759.0> ]:[info]: Cluster Strategy kubernetes-dns
2021-04-02 12:50:53.342 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1759.0> ]:[debug]:Cluster topology [proxy: [strategy: Cluster.Strategy.Kubernetes.DNS, config: [service: "proxy-headless-svc", application_name: "massa-proxy", polling_interval: 3000]]]
2021-04-02 12:50:53.364 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1867.0> ]:[error]:** System NOT running to use fully qualified hostnames **
** Hostname 10.244.0.36 is illegal **

2021-04-02 12:50:53.364 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1864.0> ]:[warn]: [libcluster:proxy] unable to connect to :"massa-proxy@10.244.0.36"
2021-04-02 12:50:53.364 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1872.0> ]:[error]:** System NOT running to use fully qualified hostnames **
** Hostname 10.244.0.36 is illegal **

2021-04-02 12:50:53.364 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1864.0> ]:[warn]: [libcluster:proxy] unable to connect to :"massa-proxy@10.244.0.36"
2021-04-02 12:50:53.364 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1873.0> ]:[info]: Starting Horde.RegistryImpl with name MassaProxy.GlobalRegistry
2021-04-02 12:50:53.365 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1876.0> ]:[info]: Starting Horde.DynamicSupervisorImpl with name MassaProxy.GlobalSupervisor
2021-04-02 12:50:53.365 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1880.0> ]:[info]: Starting Proxy Cluster...
2021-04-02 12:50:53.365 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1880.0> ]:[info]: [massa proxy on :"massa_proxy@massa-proxy-7fbc5b4889-6p5hk"]: Connecting Horde to :"massa_proxy@massa-proxy-7fbc5b4889-6p5hk"
2021-04-02 12:50:53.365 [massa_proxy@massa-proxy-7fbc5b4889-6p5hk]:[pid=<0.1880.0> ]:[info]: [massa proxy on :"massa_proxy@massa-proxy-7fbc5b4889-6p5hk"]: Connecting Horde to :"massa_proxy@massa-proxy-7fbc5b4889

In this test I used: RUN echo "-sname massa-proxy@${PROXY_POD_IP}" >> /app/massa_proxy/apps/massa_proxy/rel/vm.args.eex

sleipnir commented 3 years ago

It seems to me that the strategy is managing to resolve the addresses correctly, however, the names of the nodes are like massa-proxy@hostname instead of massa-proxy@ip and this seems to me to be the cause of the problem.

root @ sleipnir deployments wip/action-entity-protocol 
└─ # (k8s: kind-kind) πŸš€ β–Ά k get po
NAME                           READY   STATUS        RESTARTS   AGE
massa-proxy-7b495fbd94-zhrpd   1/1     Terminating   0          7m40s
massa-proxy-fb99dd779-86wlr    1/1     Running       0          34s
massa-proxy-fb99dd779-jbxqd    1/1     Running       0          34s
root @ sleipnir deployments wip/action-entity-protocol 
└─ # (k8s: kind-kind) πŸš€ β–Ά k exec -it massa-proxy-fb99dd779-86wlr sh
/home/app $ uname -a
Linux massa-proxy-fb99dd779-86wlr 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 Linux
sleipnir commented 3 years ago

Resolved with:

Change env.sh.eex

export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=proxy@${PROXY_POD_IP}

Thanks

uhthomas commented 3 years ago

Hi,

I was also struggling a lot getting libcluster to work with the DNS SRV topology.

After a few hours of toying, I've finally managed to get it to work. I'll share my manifests and config for future readers.

For context, rasmus is the name of the project.

Firstly, we need a headless service. This service must have "clusterIP": "None" and at least one port defined, otherwise you will get NXDOMAIN.

# service.json
{
  "metadata": {
    "name": "rasmus-headless",
    "namespace": "rasmus",
    "labels": {
      "app.kubernetes.io/name": "rasmus",
      "app.kubernetes.io/instance": "rasmus",
      "app.kubernetes.io/version": "0.2.13",
      "app.kubernetes.io/component": "rasmus"
    }
  },
  "spec": {
    "ports": [
      {
        "port": 1
      }
    ],
    "clusterIP": "None",
    "selector": {
      "app.kubernetes.io/name": "rasmus",
      "app.kubernetes.io/instance": "rasmus",
      "app.kubernetes.io/component": "rasmus"
    }
  },
  "kind": "Service",
  "apiVersion": "v1"
}

Secondly, you'll need to ensure you're providing an environment variable which exposes your pod's IP address. You'll also need to ensure to provide a cookie, otherwise the nodes will not be able to communicate. Example:

"env": [
    {
        "name": "POD_IP",
        "valueFrom": {
        "fieldRef": {
            "fieldPath": "status.podIP"
            }
        }
    },
    {
        "name": "RELEASE_COOKIE",
        "value": "0123456789abcdef"
    }
]

If you haven't already, run mix release.init in your project. This will create some scripts in a directory called rel. Amend rel/env.sh.eex to include

export RELEASE_DISTRIBUTION=name
export RELEASE_NODE="<%= @release.name %>@$(echo "$POD_IP" | sed 's/\./-/g').rasmus-headless.rasmus.svc.cluster.local"

Either replace the domain name rasmus-headless.rasmus.svc.cluster.local, or provide it to the script with another environment variable.

Lastly, ensure the topology is correctly configured. For example, config/config.exs:

import Config

config :libcluster,
  topologies: [
    _: [
      strategy: Elixir.Cluster.Strategy.Kubernetes.DNSSRV,
      config: [
        namespace: "rasmus",
        service: "rasmus-headless",
        application_name: "rasmus",
        polling_interval: 10_000
      ]
    ]
  ]

At this point, the application should be in a position where the nodes can form a cluster.

15:30:28.969 [info]  [libcluster:_] connected to :"rasmus@100-65-61-95.rasmus-headless.rasmus.svc.cluster.local"
15:30:28.969 [info]  [libcluster:_] connected to :"rasmus@100-65-61-95.rasmus-headless.rasmus.svc.cluster.local"
15:30:28.969 [info]  [libcluster:_] connected to :"rasmus@100-65-61-95.rasmus-headless.rasmus.svc.cluster.local"

For a detailed example, please refer to: