Trivadis / pgbasenv

pgBasEnv - PostgreSQL Base Environment Tool
https://github.com/Trivadis/pgbasenv
Apache License 2.0
9 stars 3 forks source link

Server marked DOWN #2

Closed GMOSMAR closed 3 years ago

GMOSMAR commented 3 years ago

Hi My Postgres Server is marked DOWN even when it is running. Using Version 1.2 of pgbasenv.

pgBasEnv v1.2 by Trivadis AG

Installation homes:

ALIAS | VER | OPTIONS | HOME DIR

pgh131 | 13.1 | ssl:1G:8K | /usr/local pgh131A | 13.1 | ssl:1G:8K | /usr/pgpure/postgres/13

Cluster data directories:

ALIAS | VER | STAT | PORT | PID | SIZE | PGDATA | LAST START | LAST START HOME

PGE01 | 13 | DOWN | 5432 | | 32M | /pgdata/13/PGE01 | 2021-02-04 11:20 | /usr/pgpure/postgres/13

---[PGE01]:

     Installation home: /usr/pgpure/postgres/13
Cluster data directory: /pgdata/13/PGE01
          Cluster port: 5432
        Cluster status: DOWN
       Cluster version: 13

Cluster last start time: 2021-02-04 11:20

---[05.02.2021 17:26]

tdvm0030:/home/postgres [PGE01]$ ps -fu postgres UID PID PPID C STIME TTY TIME CMD postgres 9133 1 0 Feb04 ? 00:00:02 /usr/pgpure/postgres/13/bin/postgres -D /etc/pgpure/postgres/13/PGE01 postgres 9135 9133 0 Feb04 ? 00:00:00 postgres: PGE01: checkpointer postgres 9136 9133 0 Feb04 ? 00:00:00 postgres: PGE01: background writer postgres 9137 9133 0 Feb04 ? 00:00:00 postgres: PGE01: walwriter postgres 9138 9133 0 Feb04 ? 00:00:01 postgres: PGE01: autovacuum launcher postgres 9139 9133 0 Feb04 ? 00:00:02 postgres: PGE01: stats collector postgres 9140 9133 0 Feb04 ? 00:00:00 postgres: PGE01: logical replication launcher postgres 18741 1 0 Feb04 ? 00:00:00 /usr/lib/systemd/systemd --user postgres 18742 18741 0 Feb04 ? 00:00:00 (sd-pam) postgres 26011 26008 0 17:25 ? 00:00:00 sshd: postgres@pts/0 postgres 26012 26011 0 17:25 pts/0 00:00:00 -bash postgres 26668 26012 0 17:26 pts/0 00:00:00 ps -fu postgres tdvm0030:/home/postgres [PGE01]$

aychin-tvd commented 3 years ago

Hi,

Can you please set environment and then execute a command:

PGE01
$TVD_PGHOME/bin/psql -U $PGBASENV_CHECK_USER -d $PGBASENV_CHECK_DATABASE -c ";" -t
echo $?

Just copy/paste and put here the output.

GMOSMAR commented 3 years ago

Hi

tdvm0030:/home/postgres [PGE01]$ PGE01

---[PGE01]:

     Installation home: /usr/pgpure/postgres/13
Cluster data directory: /pgdata/13/PGE01
          Cluster port: 5432
        Cluster status: DOWN
       Cluster version: 13

Cluster last start time: 2021-02-04 11:20

---[08.02.2021 08:07]

tdvm0030:/home/postgres [PGE01]$ $TVD_PGHOME/bin/psql -U $PGBASENV_CHECK_USER -d $PGBASENV_CHECK_DATABASE -c ";" -t; echo $? 0 tdvm0030:/home/postgres [PGE01]$ echo $TVD_PGHOME $PGBASENV_CHECK_USER $PGBASENV_CHECK_DATABASE /usr/pgpure/postgres/13 postgres template1 tdvm0030:/home/postgres [PGE01]$

Von: aychin-tvd notifications@github.com Gesendet: Freitag, 5. Februar 2021 19:06 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

Hi,

Can you please set environment and then execute a command:

PGE01 $TVD_PGHOME/bin/psql -U $PGBASENV_CHECK_USER -d $PGBASENV_CHECK_DATABASE -c ";" -t echo $? Just copy/paste and put here the output.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-774194815, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XBRBAG6T6Y4275XUZDS5QXQNANCNFSM4XFBJ7ZQ.

aychin-tvd commented 3 years ago

Maybe problem accessing /proc filesystem.

Can you try this? Here pid is the proess id of the postmaster process.

readlink -f /proc/<pid>/exe
GMOSMAR commented 3 years ago

Is it a problem, that the postmaster process isn't called postmaster? tdvm0030:/proc/9133 [PGE01]$ ps -fu postgres UID PID PPID C STIME TTY TIME CMD postgres 9133 1 0 Feb04 ? 00:00:06 /usr/pgpure/postgres/13/bin/postgres -D /etc/pgpure/postgres/13/PGE01 postgres 9135 9133 0 Feb04 ? 00:00:00 postgres: PGE01: checkpointer postgres 9136 9133 0 Feb04 ? 00:00:01 postgres: PGE01: background writer postgres 9137 9133 0 Feb04 ? 00:00:02 postgres: PGE01: walwriter postgres 9138 9133 0 Feb04 ? 00:00:04 postgres: PGE01: autovacuum launcher postgres 9139 9133 0 Feb04 ? 00:00:07 postgres: PGE01: stats collector postgres 9140 9133 0 Feb04 ? 00:00:00 postgres: PGE01: logical replication launcher postgres 15753 15750 0 10:46 ? 00:00:00 sshd: postgres@pts/0 postgres 15754 15753 0 10:46 pts/0 00:00:00 -bash postgres 16428 15754 0 10:52 pts/0 00:00:00 ps -fu postgres postgres 18741 1 0 Feb04 ? 00:00:00 /usr/lib/systemd/systemd --user postgres 18742 18741 0 Feb04 ? 00:00:00 (sd-pam) tdvm0030:/proc/9133 [PGE01]$ readlink -f /proc/9133/exe /usr/pgpure/postgres/13/bin/postgres tdvm0030:/proc/9133 [PGE01]$

postmaster is just a softlinkt to postgres tdvm0030:/home/postgres [PGE01]$ ls -l /usr/pgpure/postgres/13/bin/postmaster lrwxrwxrwx 1 root root 8 Dec 14 19:09 /usr/pgpure/postgres/13/bin/postmaster -> postgres

Von: aychin-tvd notifications@github.com Gesendet: Montag, 8. Februar 2021 09:54 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

Maybe problem accessing /proc filesystem.

Can you try this? Here pid is the proess id of the postmaster process.

readlink -f /proc//exe

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-774982895, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XEUXRNPB7VPFLIEKFLS56RCFANCNFSM4XFBJ7ZQ.

aychin-tvd commented 3 years ago

No, its not a problem. Its just historically called postmaster, I just ment main postgress process.

Is there pg_ctl inside bin directory?

ls /usr/pgpure/postgres/13/bin/pg_ctl

GMOSMAR commented 3 years ago

It's there:

tdvm0030:/home/postgres [PGE01]$ which pg_ctl /usr/pgpure/postgres/13/bin/pg_ctl tdvm0030:/home/postgres [PGE01]$ ls -l /usr/pgpure/postgres/13/bin/pg_ctl -rwxr-xr-x 1 root root 98880 Dec 14 19:09 /usr/pgpure/postgres/13/bin/pg_ctl tdvm0030:/home/postgres [PGE01]$ pg_ctl status -D /pgdata/13/PGE01 pg_ctl: server is running (PID: 9133) /usr/pgpure/postgres/13/bin/postgres "-D" "/etc/pgpure/postgres/13/PGE01" tdvm0030:/home/postgres [PGE01]$

Von: aychin-tvd notifications@github.com Gesendet: Montag, 8. Februar 2021 11:12 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

No, its not a problem. Its just historically called postmaster, I just ment main postgress process.

Is there pg_ctl inside bin directory?

ls /usr/pgpure/postgres/13/bin/pg_ctl

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-775031561, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XFAIOZ7P6LTCPJ3HBLS562H5ANCNFSM4XFBJ7ZQ.

aychin-tvd commented 3 years ago

Execute please this code in your shell, it should return the pid of your running instance

for i in $(ps -o ppid= -C postgres -C postmaster -C edb-postgres | sort | uniq -c | awk '{ if ($1 > 1 && $2 > 1) print $2}'); do dir=$(readlink -f /proc/$i/exe) if [[ ! -z $dir ]]; then dir=$(dirname $dir) [[ -f $dir/pg_ctl ]] && echo "$i;$(dirname $dir);" fi done

GMOSMAR commented 3 years ago

It does it.

tdvm0030:/home/postgres [PGE01]$ for i in $(ps -o ppid= -C postgres -C postmaster -C edb-postgres | sort | uniq -c | awk '{ if ($1 > 1 && $2 > 1) print $2}'); do

dir=$(readlink -f /proc/$i/exe) if [[ ! -z $dir ]]; then dir=$(dirname $dir) [[ -f $dir/pg_ctl ]] && echo "$i;$(dirname $dir);" fi done 9133;/usr/pgpure/postgres/13; tdvm0030:/home/postgres [PGE01]$

Von: aychin-tvd notifications@github.com Gesendet: Montag, 8. Februar 2021 11:41 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

Execute please this code in your shell, it should return the pid of your running instance

for i in $(ps -o ppid= -C postgres -C postmaster -C edb-postgres | sort | uniq -c | awk '{ if ($1 > 1 && $2 > 1) print $2}'); do dir=$(readlink -f /proc/$i/exe) if [[ ! -z $dir ]]; then dir=$(dirname $dir) [[ -f $dir/pg_ctl ]] && echo "$i;$(dirname $dir);" fi done

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-775048917, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XBPFWWEYOOI2EOWDXDS565SJANCNFSM4XFBJ7ZQ.

aychin-tvd commented 3 years ago

ok, then problem can be in identifying data directory. We use lsof to get the list of all directories opened by the postgres process.

Can you execute this code?

for d in $(lsof -p 9133 2> /dev/null | grep DIR | awk '{print $9}'); do
  [[ -f $d/global/pg_control ]] && echo $d
done
GMOSMAR commented 3 years ago

tdvm0030:/home/postgres [PGE01]$ for d in $(lsof -p 9133 2> /dev/null | grep DIR | awk '{print $9}'); do

[[ -f $d/global/pg_control ]] && echo $d done /pgdata/13/PGE01 tdvm0030:/home/postgres [PGE01]$

Should I try to install version 1.3 of pgbasenv?

Von: aychin-tvd notifications@github.com Gesendet: Montag, 8. Februar 2021 15:41 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

ok, then problem can be in identifying data directory. We use lsof to get the list of all directories opened by the postgres process.

Can you execute this code?

for d in $(lsof -p 9133 2> /dev/null | grep DIR | awk '{print $9}'); do

[[ -f $d/global/pg_control ]] && echo $d

done

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-775196170, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XBMGW6LL3SK3FQ5O2LS57ZXHANCNFSM4XFBJ7ZQ.

aychin-tvd commented 3 years ago

It will be definitevily worth to install 1.3, but we need to identify the problem, maybe we will need to patch current version.

Execute please this command, it will generate a lot of output, you can send the file then:

cd $PGBASENV_BASE/bin
bash -x ./pgup.sh > pgup.debug 2>&1

I will need pgup.debug file.

GMOSMAR commented 3 years ago

Version 1.3 also has the status DOWN.

Von: aychin-tvd notifications@github.com Gesendet: Montag, 8. Februar 2021 16:02 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

It will be definitevily worth to install 1.3, but we need to identify the problem, maybe we will need to patch current version.

Execute please this command, it will generate a lot of output, you can send the file then:

cd $PGBASENV_BASE/bin

bash -x ./pgup.sh > pgup.debug 2>&1

I will need pgup.debug file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-775210404, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XD5CZRZJ53Y6DYVJODS574HBANCNFSM4XFBJ7ZQ.

aychin-tvd commented 3 years ago

It was expected. Send me please pguo.debug file.

GMOSMAR commented 3 years ago

Didn't you receive the attachment? Do I have to copy the output into the mail body?

Von: aychin-tvd notifications@github.com Gesendet: Montag, 8. Februar 2021 16:23 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

It was expected. Send me please pguo.debug file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-775224848, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XABYILRCSD464NQZY3S576WBANCNFSM4XFBJ7ZQ.

aychin-tvd commented 3 years ago

Send it directly to aychin.gasimov@trivadis.com please.

aychin-tvd commented 3 years ago

Can you provide the location of lsof?

which lsof
GMOSMAR commented 3 years ago

tdvm0030:/home/postgres [PGE01]$ which lsof /usr/bin/lsof tdvm0030:/home/postgres [PGE01]$ ls -l /usr/bin/lsof -rwxr-xr-x 1 root root 168408 May 25 2018 /usr/bin/lsof tdvm0030:/home/postgres [PGE01]$ cat /etc/os-release NAME="SLES" VERSION="15-SP2" VERSION_ID="15.2" PRETTY_NAME="SUSE Linux Enterprise Server 15 SP2" ID="sles" ID_LIKE="suse" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:suse:sles:15:sp2" tdvm0030:/home/postgres [PGE01]$

Von: aychin-tvd notifications@github.com Gesendet: Montag, 8. Februar 2021 16:45 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

Can you provide the location of lsof?

which lsof

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-775240614, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XF7VPKRY5DBI25LGMLS6ABINANCNFSM4XFBJ7ZQ.

aychin-tvd commented 3 years ago

Good. And what is output of netstat -ltnp

aychin-tvd commented 3 years ago

Actually it is because of SUSE Enterprise Linux 15. We need to adapt the tool to this OS.

We will do it soon and inform you.

GMOSMAR commented 3 years ago

Probably that's the problem.

In SLES15 netstat isn't installed by default. It is replaced by "ss" (I'm struggling with this change). tdvm0030:/home/postgres [PGE01]$ ss -tulpn Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port udp UNCONN 0 0 0.0.0.0:111 0.0.0.0: udp UNCONN 0 0 0.0.0.0:161 0.0.0.0: udp UNCONN 0 0 127.0.0.1:323 0.0.0.0: udp UNCONN 0 0 0.0.0.0:35321 0.0.0.0: udp UNCONN 0 0 0.0.0.0:952 0.0.0.0: udp UNCONN 0 0 127.0.0.1:958 0.0.0.0: udp UNCONN 0 0 0.0.0.0:37852 0.0.0.0: udp UNCONN 0 0 [::]:55130 [::]: udp UNCONN 0 0 [::]:111 [::]: udp UNCONN 0 0 [::1]:323 [::]: udp UNCONN 0 0 [::]:952 [::]: udp UNCONN 0 0 [::]:52510 [::]: tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0: tcp LISTEN 0 128 0.0.0.0:5432 0.0.0.0: users:(("postgres",pid=9133,fd=5)) tcp LISTEN 0 100 127.0.0.1:25 0.0.0.0: tcp LISTEN 0 128 0.0.0.0:51135 0.0.0.0: tcp LISTEN 0 64 0.0.0.0:41891 0.0.0.0: tcp LISTEN 0 128 127.0.0.1:199 0.0.0.0: tcp LISTEN 0 128 0.0.0.0:111 0.0.0.0: tcp LISTEN 0 128 [::]:22 [::]: tcp LISTEN 0 128 [::]:5432 [::]: users:(("postgres",pid=9133,fd=6)) tcp LISTEN 0 128 [::]:56121 [::]: tcp LISTEN 0 128 [::]:111 [::]: tcp LISTEN 0 64 [::]:42769 [::]: tdvm0030:/home/postgres [PGE01]$

Von: aychin-tvd notifications@github.com Gesendet: Montag, 8. Februar 2021 16:53 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

Good. And what is output of netstat -ltnp

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-775246552, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XETO4J5TLEZ42PD5T3S6ACGPANCNFSM4XFBJ7ZQ.

aychin-tvd commented 3 years ago

Yes it is the case. As I wrote, we will adapt the tool to SUSE 15. In few days I will upload the new version which you can use.

aychin-tvd commented 3 years ago

Hi,

You can install now version 1.4 and test.

Please inform us if issue is fixed, to close this thread.

Regards, Aychin

GMOSMAR commented 3 years ago

Hi

Version 1.4 now shows my postgres database as up in pgp.sh and pgstatus.sh. Thank you

Regards Franco Von: aychin-tvd notifications@github.com Gesendet: Freitag, 12. Februar 2021 16:21 An: Trivadis/pgbasenv pgbasenv@noreply.github.com Cc: Martini Franco franco.martini@gmos.ch; Author author@noreply.github.com Betreff: Re: [Trivadis/pgbasenv] Server marked DOWN (#2)

Hi,

You can install now version 1.4 and test.

Please inform us if issue is fixed, to close this thread.

Regards, Aychin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Trivadis/pgbasenv/issues/2#issuecomment-778257739, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASXX6XAL2S57JAZT2RGTKATS6VBOHANCNFSM4XFBJ7ZQ.