ClusterLabs / PAF

PostgreSQL Automatic Failover: High-Availibility for Postgres, based on Pacemaker and Corosync.
http://clusterlabs.github.io/PAF/
Other
342 stars 55 forks source link

psql: could not connect to server: No such file or directory #24

Closed nnn-dev closed 8 years ago

nnn-dev commented 8 years ago

Hello

We don't use default parameter for our database. So we have indicated system_user, pgdata, pghost on our resource

All pg_xxx commands works but the psql into _confirm_role function doesn't work. It seems the parameters aren't used (see below).

# pcs resource debug-start pgsqlms

Operation start for pgsqlms:0 (ocf:heartbeat:pgsqlms) returned 0
 >  stdout: /DBTEST/tmp:32100 - no response
 >  stdout: pg_ctl: no server running
 >  stdout: waiting for server to start....2016-05-25 15:16:00 CEST [28545]: [1-1] user=,db= LOG:  redirecting log output to logging collector process
 >  stdout: 2016-05-25 15:16:00 CEST [28545]: [2-1] user=,db= HINT:  Future log output will appear in directory "/DBTEST/log/tech".
 >  stdout:  done
 >  stdout: server started
 >  stdout: /DBTEST/tmp:32100 - accepting connections
 >  stderr: Use of uninitialized value $postgres_gid in concatenation (.) or string at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 88.
 >  stderr: Use of uninitialized value $postgres_gid in concatenation (.) or string at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 88.
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: _runas: launching as "dbtest" command "/usr/pgsql-9.4/bin/pg_isready -h /DBTEST/tmp -p 32100"
 >  stderr: Use of uninitialized value $( in scalar assignment at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 92.
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: pgsql_monitor: instance "pgsqlms:0" is not listening
 >  stderr: Use of uninitialized value $postgres_gid in concatenation (.) or string at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 88.
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: _runas: launching as "dbtest" command "/usr/pgsql-9.4/bin/pg_ctl -D /DBTEST/base/system status"
 >  stderr: Use of uninitialized value $postgres_gid in concatenation (.) or string at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 88.
 >  stderr: Use of uninitialized value $( in scalar assignment at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 92.
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: _confirm_stopped: no postmaster process found for instance "pgsqlms:0"
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: _controldata: instance "pgsqlms:0" state is "shut down in recovery"
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: _confirm_stopped: instance "pgsqlms:0" controldata indicates that the instance was propertly shut down
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: pgsql_start: instance "pgsqlms:0" is not running, starting it as a secondary
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: _create_recovery_conf: get replication configuration from the template file "/DBTEST/recovery.conf"
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: _create_recovery_conf: write the replication configuration to "/DBTEST/base/system/recovery.conf" file
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:00  DEBUG: _runas: launching as "dbtest" command "/usr/pgsql-9.4/bin/pg_ctl -D /DBTEST/base/system -w start"
 >  stderr: Use of uninitialized value $postgres_gid in concatenation (.) or string at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 88.
 >  stderr: Use of uninitialized value $postgres_gid in concatenation (.) or string at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 88.
 >  stderr: Use of uninitialized value $( in scalar assignment at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 92.
 >  stderr: Use of uninitialized value $postgres_gid in concatenation (.) or string at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 88.
 >  stderr: pgsqlms(pgsqlms:0)[28530]: 2016/05/25_15:16:01  DEBUG: _runas: launching as "dbtest" command "/usr/pgsql-9.4/bin/pg_isready -h /DBTEST/tmp -p 32100"
 >  stderr: Use of uninitialized value $( in scalar assignment at /usr/lib/ocf/resource.d/heartbeat/pgsqlms line 92.
 >  stderr: pgsqlms(pgsqlms:0)[4827]: 2016/05/25_15:16:05  DEBUG: pgsql_monitor: instance "pgsqlms:0" is listening

 >  stderr: psql: could not connect to server: No such file or directory
 >  stderr:     Is the server running locally and accepting
 >  stderr:     connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 >  stderr: pgsqlms(pgsqlms:0)[4827]: 2016/05/25_15:16:05  DEBUG: _query: psql return code: 2
 >  stderr: pgsqlms(pgsqlms:0)[4827]: 2016/05/25_15:16:05  DEBUG: _query: @res:$VAR1 = [];
 >  stderr:
 >  stderr: 2016/05/25_15:16:05  ERROR: _confirm_role: psql could not connect to instance "pgsqlms:0"
 >  stderr: 2016/05/25_15:16:05  ERROR: pgsql_start: unexpected state for instance "pgsqlms:0" (returned 1)

Best regards,

nnn-dev commented 8 years ago

Temporary I have do a workaround

On _query function, I have hard-coded the system_user in $connstr parameter and added -hand -p parameter for pgsql

sub _query {
    my $query        = shift;
    my $res          = shift;
    my $connstr      = "dbname=postgres user=dbtest";
    my $RS           = chr(30); # ASCII RS  (record separator)
    my $FS           = chr(3);  # ASCII ETX (end of text)
    my $postgres_uid = getpwnam( $system_user );
    my $oldeuid      = $>;
    my $tmpfile;
    my @res;
    my $ans;
    my $pid;
    my $rc;
    if ( $pid == 0 ) { # child
        exec $PGPSQL, '--set', 'ON_ERROR_STOP=1', '-qXAtf', "$tmpfile",
            '-R', "$RS", '-F', "$FS","-h", $pghost, "-p", $pgport, "$connstr";
    }
ioguix commented 8 years ago

Hi,

What is your exact version of PAF so I can compare the error messages with the code? I am quite worrying about the perl warning. I need to check if this issue still exists in current dev version.

About the system_user parameter, it is not aimed to be used as a connection role, but as a system user to launch commands related to the PostgreSQL instance (start, stop, querying). Don't you have the postgres role available in your instance? Did you created your instance without this role or dropped it? I guess we could either use the system_user as default connection role and/or add a pguser parameter if really needed.

You are right about the host and the port, we forgot to use them in the _query subroutine. I will fix this quickly.

nnn-dev commented 8 years ago

Thanks

At least, we have tested with the last master version available yesterday (commit 76298292eff05d52595edd44efb59eec4e9e7416)

We don't have a postgres user because we run initdb as our system_user.

Regards

ioguix commented 8 years ago

Ok,

Checking line 88 in this commit doesn't seems to make sens with your perl warnings: https://github.com/dalibo/PAF/blob/76298292eff05d52595edd44efb59eec4e9e7416/script/pgsqlms#L88

I guess, it is related to the _runas subroutine there: https://github.com/dalibo/PAF/blob/76298292eff05d52595edd44efb59eec4e9e7416/script/pgsqlms#L173

Could you tell me if the system_user you are using belongs to a group? Does it had some secondary groups as well?

About the role connexion, ok, I guess we should use system_user as default connection role instead of postgres, we'll see later if we really need to add a pguser parameter. I'll fix this tomorrow.

Thank you for your tests and feedback!

nnn-dev commented 8 years ago

Sorry, seems my colleague has been reinstalled 1.0.2 version meantime. But I suppose the problem is still present.

Seems the problem is that the group has not the same name of the user

My system_user is dbtest and group is gdbtest.

>id dbtest
uid=6479(dbtest) gid=6479(gdbtest) groups=6479(gdbtest),26(postgres)

I have tested directly the getgrnam into perl:

> perl -e 'print getgrnam("dbtest")'
# nothing it is empty

> perl -e 'print getgrnam("gbtest")'
gdbtestx6479postgres
ioguix commented 8 years ago

I just pushed a fix in all branches (v1.0, v1.1 and master). v1.x are dedicated to old Pacemaker stacks (eg. EL 6). This has been decided this week when we were reported the current devel version broke compatibility with old stacks. We still need to document that.

If you are using a recent Pacemaker stack (Pacemaker1.13+ and corosync 2), you can use the current master branch.

Anyway. Back to the group issue. The fact getgrnam("dbtest") returns an empty result explain the errors.

Could you explain me how you created your dbtest/gdbtest user/group so I can reproduce on my side?

What is the output of the following commands:

grep dbtest /etc/passwd
grep dbtest /etc/group

Thank you

nnn-dev commented 8 years ago

I don't know how user and group are created (it's do by another team).

Below the result of command

>grep dbtest /etc/passwd
dbtest:x:6479:6479::/DBTEST/home:/bin/bash

>grep dbtest /etc/group
postgres:x:26:dbtest
gdbtest:x:6479:postgres

As I have read, getgrnam need a group name as parameter. So it's normal that getgrnam("dbtest") returns nothing (our group name is gdbtest).

Perhaps we can use the list context of getpwnam to retreive gid (see here.)

Something like:

    my @postgres_info =  getpwnam( $system_user );
    my $postgres_uid = $postgres_info[2];
    my $postgres_gid = $postgres_info[3];
    #my $postgres_uid = getpwnam( $system_user );
    #my $postgres_gid = getgrnam( $system_user );
ioguix commented 8 years ago

Hi,

I believe all the bug you reported here are all fixed in various branches now. Could you give it a test?

Thank you for your tests and bug reports!

nnn-dev commented 8 years ago

Thanks.

No more warning. Tested with current 1.0_dev, 1.1_dev, 2.0_dev (devel) branch.

Seems good, I close the issue.

ioguix commented 8 years ago

Thank you for your feedback!