CrunchyData / crunchy-containers

Containers for Managing PostgreSQL on Kubernetes by Crunchy Data
https://www.crunchydata.com/
Apache License 2.0
1.01k stars 329 forks source link

example replica start fail. #1330

Closed 2qif49lt closed 3 years ago

2qif49lt commented 3 years ago

Which example are you working with? examples\kube\postgres-gis-ha

What is the current behavior? master is good, replica have problem. FATAL: invalid value for parameter "port": "" which environment is it?

std@master:/data/samba/std/postgis-ha$ ./get.sh 
NAME                             READY   STATUS    RESTARTS   AGE   IP                NODE                 NOMINATED NODE   READINESS GATES
postgis-ha-01-844bdbf676-q6z57   1/1     Running   0          82m   100.173.192.203   node02.std.51   <none>           <none>
postgis-ha-02-5d8d9694d7-g522p   0/1     Running   0          43m   100.183.6.12      node01.std.51   <none>           <none>

NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE   SELECTOR
master    ClusterIP   100.66.78.80    <none>        5432/TCP,8009/TCP   23m   role=master
replica   ClusterIP   100.78.34.222   <none>        5432/TCP,8009/TCP   23m   role=replica

std@master:/data/samba/std/postgis-ha$ kubectl logs postgis-ha-02-5d8d9694d7-g522p -n postgis-ha 
2021-02-24 02:22:24,542 INFO: Lock owner: postgis-ha-01-844bdbf676-q6z57; I am postgis-ha-02-5d8d9694d7-g522p
2021-02-24 02:22:24,549 INFO: Reaped pid=65844, exit status=0
2021-02-24 02:22:24,552 INFO: Reaped pid=65845, exit status=0
2021-02-24 02:22:24,552 INFO: Local timeline=None lsn=None
2021-02-24 02:22:24,552 INFO: Lock owner: postgis-ha-01-844bdbf676-q6z57; I am postgis-ha-02-5d8d9694d7-g522p
2021-02-24 02:22:24,553 INFO: starting as a secondary
2021-02-24 02:22:24.681 GMT [65851] FATAL:  invalid value for parameter "port": ""
2021-02-24 02:22:24,691 INFO: postmaster pid=65851
/tmp:5432 - no response
2021-02-24 02:22:24,698 WARNING: Postgresql is not running.
2021-02-24 02:22:24,698 INFO: Lock owner: postgis-ha-01-844bdbf676-q6z57; I am postgis-ha-02-5d8d9694d7-g522p
2021-02-24 02:22:24,701 INFO: Reaped pid=65853, exit status=0
2021-02-24 02:22:24,701 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 6932636213127852171
  Database cluster state: in production
  pg_control last modified: Wed Feb 24 01:29:14 2021
  Latest checkpoint location: 0/3E37B90
  Latest checkpoint's REDO location: 0/3E37B58
  Latest checkpoint's REDO WAL file: 000000010000000000000003
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:509
  Latest checkpoint's NextOID: 24576
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 509
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Wed Feb 24 01:29:14 2021
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 6
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 727a8573026167a4bc0847d7dac08ac8063170144c20db106caca20c7bf77a15

2021-02-24 02:22:24,702 INFO: Lock owner: postgis-ha-01-844bdbf676-q6z57; I am postgis-ha-02-5d8d9694d7-g522p
2021-02-24 02:22:24,708 INFO: Reaped pid=65855, exit status=0
2021-02-24 02:22:24,711 INFO: Reaped pid=65856, exit status=0
2021-02-24 02:22:24,711 INFO: Local timeline=None lsn=None
2021-02-24 02:22:24,711 INFO: Lock owner: postgis-ha-01-844bdbf676-q6z57; I am postgis-ha-02-5d8d9694d7-g522p
2021-02-24 02:22:24,712 INFO: starting as a secondary
2021-02-24 02:22:24.839 GMT [65865] FATAL:  invalid value for parameter "port": ""

What is the expected behavior?

Other information (e.g. detailed explanation, related issues, etc)

Please tell us about your environment:

If possible please run the following on the kubernetes or Openshit (oc) commands and provide the result: kubectl describe yourPodName

# postgresql deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgis-ha-02
  labels:
    app: postgis-ha
    name: postgis-ha-02
    cluster: std
  namespace: postgis-ha
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgis-ha
      name: postgis-ha-02
      cluster: std
  template:
    metadata:
      labels:
        app: postgis-ha
        name: postgis-ha-02
        cluster: std
    spec:
      imagePullSecrets:
      - name: images-registry
      nodeSelector:
        role: worker
      terminationGracePeriodSeconds: 30
      containers:
      - name: postgis
        image: crunchy-postgres-gis-ha:centos8-13.2-3.0-4.6.1
        imagePullPolicy: IfNotPresent
        securityContext:
          runAsUser: 1001
        resources:
          requests:
            memory: 8Gi
            cpu: 4
          limits:
            memory: 32Gi
            cpu: 16
        readinessProbe:
          exec:
            command:
              - "/bin/bash"
              - "-c"
              - "[[ -f '/tmp/pgha_initialized' ]]"
              - "&& pg_isready -h /tmp"
          initialDelaySeconds: 30
          timeoutSeconds: 8
        env:
        - name: PATRONI_KUBERNETES_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: PATRONI_KUBERNETES_LABELS
          value: "{cluster: std}"
        - name: PGHA_PRIMARY_HOST
          value: "master"
        - name: PATRONI_KUBERNETES_SCOPE_LABEL
          value: "cluster"
        - name: PATRONI_POSTGRESQL_DATA_DIR
          value: "/pgdata/postgres-gis-ha-02"
        - name: PATRONI_SCOPE
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['cluster']
        volumeMounts:
        - mountPath: /pgdata
          name: pgdata
        - mountPath: /backrestrepo
          name: backrestrepo
        - mountPath: /pgconf/pguser
          name: pguser
        - mountPath: /pgconf/pgsuper
          name: pgsuper
        - mountPath: /pgconf/pgreplicator
          name: pgreplicator
        ports:
        - containerPort: 5432
          protocol: TCP
        - containerPort: 8009
          protocol: TCP
      volumes:
      - name: pgdata
        persistentVolumeClaim:
          claimName: postgres-gis-ha-02-pgdata
      - name: backrestrepo
        persistentVolumeClaim:
          claimName: postgres-gis-ha-backrestrepo
      - name: pguser
        secret:
          secretName: pguser
      - name: pgsuper
        secret:
          secretName: pgsuper
      - name: pgreplicator
        secret:
          secretName: pgreplicator

kubectl describe pvc kubectl get nodes kubectl log yourPodName

jkatz commented 3 years ago

You should use the Postgres Operator.

2qif49lt commented 3 years ago

You should use the Postgres Operator.

would you mind helping me to fix this problem? i think i will to use operator finally before i done by this method as i think this is a simpler way to understand pg-ha.

jkatz commented 3 years ago

This container is designed to run with the Postgres Operator. Getting it up and running with the Postgres Operator is much simpler. Please see the Quickstart.