goharbor / harbor-helm

The helm chart to deploy Harbor
Apache License 2.0
1.15k stars 747 forks source link

PostgreSQL database keep recovering #1685

Closed Noisia0800 closed 1 week ago

Noisia0800 commented 6 months ago

Hello, I've install Harbor via helm chart in kubernetes. Our version of harbor is v2.7.1, harbor seems to running good for few months, but now we are facing some issues with PostgreSQL database. (goharbor/harbor-db:v2.7.1)

PostgreSQL database keep recovering state, but sometimes its is running somehow.

2024-01-16 14:53:15.270 UTC [1] LOG: terminating any other active server processes 2024-01-16 14:53:15.297 UTC [1] LOG: all server processes terminated; reinitializing 2024-01-16 14:53:15.624 UTC [27] LOG: database system was shut down at 2024-01-16 14:53:12 UTC 2024-01-16 14:53:15.624 UTC [28] FATAL: the database system is in recovery mode 2024-01-16 14:53:16.526 UTC [1] LOG: database system is ready to accept connections 2024-01-16 14:53:22.282 UTC [1] LOG: server process (PID 41) exited with exit code 141 2024-01-16 14:53:22.282 UTC [1] LOG: terminating any other active server processes 2024-01-16 14:53:22.282 UTC [44] WARNING: terminating connection because of crash of another server process 2024-01-16 14:53:22.282 UTC [44] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2024-01-16 14:53:22.282 UTC [44] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2024-01-16 14:53:22.283 UTC [32] WARNING: terminating connection because of crash of another server process 2024-01-16 14:53:22.283 UTC [32] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2024-01-16 14:53:22.283 UTC [32] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2024-01-16 14:53:22.936 UTC [1] LOG: all server processes terminated; reinitializing 2024-01-16 14:53:23.239 UTC [45] LOG: database system was interrupted; last known up at 2024-01-16 14:53:16 UTC 2024-01-16 14:53:30.791 UTC [55] FATAL: the database system is in recovery mode 2024-01-16 14:53:40.794 UTC [65] FATAL: the database system is in recovery mode 2024-01-16 14:53:50.805 UTC [75] FATAL: the database system is in recovery mode 2024-01-16 14:54:00.811 UTC [85] FATAL: the database system is in recovery

how to resolve this issue ?

Thanks for help

MinerYang commented 6 months ago

If nothing lib files or system settings has been modified manually, possibly due to OOM and kill the process. Could you share the postgresql.conf as well as your database configurations via values.yml. And kindly check the memory usage of the node that pg is running

free -m
zyyw commented 6 months ago

Could you please also describe database pod, and share the output with us?

github-actions[bot] commented 4 months ago

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

Noisia0800 commented 4 months ago

If nothing lib files or system settings has been modified manually, possibly due to OOM and kill the process. Could you share the postgresql.conf as well as your database configurations via values.yml. And kindly check the memory usage of the node that pg is running

free -m

Hello, sorry for late reply, conf is in the attachment, only what i changed was the shared_buffers to 1024MB ( the rest is defautl) and here is my configurations from values.yaml as you mentioned: postgresql.txt

`database:

if external database is used, set "type" to "external"

and fill the connection informations in "external" section

type: internal internal:

set the service account to be used, default if left empty

serviceAccountName: ""
# mount the service account token
automountServiceAccountToken: false
image:
  repository: goharbor/harbor-db
  tag: v2.7.1
# The initial superuser password for internal database
password: "changeit"
# The size limit for Shared memory, pgSQL use it for shared_buffer
# More details see:
# https://github.com/goharbor/harbor/issues/15034
shmSizeLimit: 1Gi
resources:
   requests:
     memory: 1Gi
     cpu: 1
# The timeout used in livenessProbe; 1 to 5 seconds
livenessProbe:
  timeoutSeconds: 2
# The timeout used in readinessProbe; 1 to 5 seconds
readinessProbe:
  timeoutSeconds: 2
nodeSelector: {}
tolerations: []
affinity: {}
## The priority class to run the pod as
priorityClassName:
initContainer:
  migrator: {}
  #resources:
  #  requests:
  #    memory: 
  #    cpu:
  permissions: {}
  # resources:
  #  requests:
  #    memory: 128Mi
  #    cpu: 100m`
Noisia0800 commented 4 months ago

Could you please also describe database pod, and share the output with us?

Hello, sorry for late response, here is the described pod. (attach)

pod.txt

jackchuong commented 3 months ago

Hello @Noisia0800 , did you fix the issue ? I have same issue , harbor is install by helm , using psql internal this is apart of my values.yaml

database:
  # if external database is used, set "type" to "external"
  # and fill the connection information in "external" section
  type: internal
  internal:
    # set the service account to be used, default if left empty
    serviceAccountName: ""
    # mount the service account token
    automountServiceAccountToken: false
    image:
      repository: goharbor/harbor-db
      tag: v2.8.3
    # The initial superuser password for internal database
    password: "changeit"
    # The size limit for Shared memory, pgSQL use it for shared_buffer
    # More details see:
    # https://github.com/goharbor/harbor/issues/15034
    shmSizeLimit: 512Mi
    resources:
      requests:
        memory: 512Mi
        cpu: 250m
      limits:
        memory: 1000Mi
        cpu: 1000m
    # The timeout used in livenessProbe; 1 to 5 seconds
    livenessProbe:
      timeoutSeconds: 1
    # The timeout used in readinessProbe; 1 to 5 seconds
    readinessProbe:
      timeoutSeconds: 1
    nodeSelector: {}
    tolerations: []
    affinity: {}
    ## The priority class to run the pod as
    priorityClassName:
    initContainer:
      migrator: {}
      # resources:
      #  requests:
      #    memory: 128Mi
      #    cpu: 100m
      permissions: {}
      # resources:
      #  requests:
      #    memory: 128Mi
      #    cpu: 100m
github-actions[bot] commented 1 month ago

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] commented 1 week ago

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.