Multiple instances can be run if the pid file is specified as a relative path

agroal / pgagroal

High-performance connection pool for PostgreSQL

https://agroal.github.io/pgagroal/

BSD 3-Clause "New" or "Revised" License

667 stars 59 forks source link

Multiple instances can be run if the pid file is specified as a relative path #426

Closed fluca1978 closed 3 months ago

fluca1978 commented 3 months ago

On 8a1d6416f02033a7a307439983ee6982513f5bc5 having pid_file = relative_file.pid makes pgagroal able to run multiple times from different directories.

Example: running the first instance:

Running the second instance from a different directory:

[luca@rachel]~% pgagroal
DEBUG network.c:648 server: bind: 127.0.0.1:54322 (Address already in use)
DEBUG network.c:648 server: bind: 10.0.2.15:54322 (Address already in use)
DEBUG network.c:648 server: bind: 192.168.222.50:54322 (Address already in use)
DEBUG network.c:648 server: bind: ::1:54322 (Address already in use)
DEBUG network.c:648 server: bind: fe80::a00:27ff:fee4:438d:54322 (Invalid argument)

I think we should either force an absolute pid_file, therefore aborting execution if pid_file is not absolute, or abort the execution if the bind fails. In any case, the fact that bind failure allows for continuation is suspicions.

jesperpedersen commented 3 months ago

We should def error out if the configuration is the same.

However, it should be possible to run multiple instance on the same server - like one for the primary instance, and another for standby

fluca1978 commented 3 months ago

We should def error out if the configuration is the same.

The only "quick" way to find out if the configuration is (almost) the same is the failure of bind or the same usage of the managament socket (and it could be also the metrics one). If any of these is already in use, we should abort.

However, it should be possible to run multiple instance on the same server - like one for the primary instance, and another for standby

Good point, but while it is immediate to find out a "misrun" by the user when using the same configuration for multiple instances, if the pid_file is absolute, this becomes harder to detect if the pid file is relative (until we fix the above socket problems).

decarv commented 3 months ago

@fluca1978 I couldn't reproduce this, but I may have misunderstood the bug.

What I am trying to do is set the unix_socket_dir to a relative path in config and then run two pgagroal instances from different directories. What I get is a bind error.

If you could give me more details I could work something out.

fluca1978 commented 3 months ago

@fluca1978 I couldn't reproduce this, but I may have misunderstood the bug.

What I am trying to do is set the unix_socket_dir to a relative path in config and then run two pgagroal instances from different directories. What I get is a bind error.

If you could give me more details I could work something out.

When I launch the second instance, I got a bind error too, but the instance continues to run. Is your second instance aborting? That could be due to the presence or absence of other network cards?

decarv commented 3 months ago

When I launch the second instance, I got a bind error too, but the instance continues to run. Is your second instance aborting?

Yes, mine aborts exactly after returning from pgagroal_bind function.

$ ./pgagroal -c pgagroal.conf
2024-04-03 10:54:45 DEBUG configuration.c:2656 PID file automatically set to: [./pgagroal.2345.pid]
2024-04-03 10:54:45 DEBUG network.c:648 server: bind: localhost:2345 (Address already in use)
2024-04-03 10:54:45 FATAL main.c:924 pgagroal: Could not bind to localhost:2345

That could be due to the presence or absence of other network cards?

I have researched this and it's possible that it is a matter of how the OS deals with SO_REUSEADDR, but I need to research more. Do you have details on the address-port pairs that were bound on each processes?

fluca1978 commented 3 months ago

Apprently the problem is with `host configuration: if set to localhost the second instance aborts as expected:

% pgagroal
-> DEBUG network.c:648 server: bind: localhost:54322 (Address already in use)
-> DEBUG network.c:648 server: bind: localhost:54322 (Address already in use)
-> FATAL main.c:924 pgagroal: Could not bind to localhost:54322

but when set to * the second instance runs.