Closed ramonskie closed 8 years ago
The postgres.replication.master
property needs to be a scalar, not a list. Try changing this:
properties:
postgres:
replication:
master:
- 172.21.42.167
to this:
properties:
postgres:
replication:
master: 172.21.42.167
The startup script is stringifying your array, which is where those weird square brackets come from in could not translate host name "[172.21.42.167]"
okay different error now..
[repl:slave] DATADIR (/var/vcap/store/postgres/db) not found; seeding from preferred master (172.21.42.167)
pg_basebackup: could not connect to server: could not connect to server: Connection refused
Is the server running on host "172.21.42.167" and accepting
TCP/IP connections on port 6432?
which is a bit of inception because postgres needs to be up and running before it can connects to itself
i also see the following in the monit/postgres log
ls: cannot access /var/vcap/packages/*/*/*.jar: No such file or directory
$PATH /var/vcap/packages/postgres/bin:/var/vcap/packages/pgpool2/bin:/bin:/usr/bin:/sbin:/usr/sbin
Can you pastebin /var/vcap/jobs/postgres/bin/ctl
?
Is your deployment multi-homed?
multi-homed? i don't get what you mean
You've got two networks on your postgres nodes:
networks:
- name: default
default: [dns, gateway]
- name: floating
static_ips:
- 172.21.42.167
- 172.21.42.170
- 172.21.42.171
On postgres/0 (master), the interface attached to the default
network is getting the IP 172.21.28.224
, which doesn't match the configured master IP, and causing the bin/ctl
script to think that the node is actually a slave to 172.21.42.167
, causing the chicken/egg problem with seeding.
(Full disclosure, I haven't tested this on AWS yet, just vSphere and Warden CPIs)
Can you flip the order of the network definitions to this:
jobs:
- name: postgres
# ...
networks:
- name: floating
static_ips:
- 172.21.42.167
- 172.21.42.170
- 172.21.42.171
- name: default
default: [dns, gateway]
i have got just 1 network the static ips are floating ips in openstack and not known by the vm itself they are just routed to the ips they get from the dymanicly defautlt network
for example vm get ip 10.0.0.1 172.21.42.167 is routed to 10.0.0.1 so the floating ip is not configured in the vm itself
Odd. what is 172.21.28.224
?
thats the dynamically assigned ip address
Currently, this release relies on address introspection to determine who the master node is. If the floating statics are managed/routed external to the VM itself, the release (as written) will not work.
Can you pastebin ip addr show
from the postgres/0
node?
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:54:aa:5c brd ff:ff:ff:ff:ff:ff
inet 172.21.28.224/24 brd 172.21.28.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe54:aa5c/64 scope link
valid_lft forever preferred_lft forever
Drat.
I will work up a patch to force the master to be the 0th node, since we only support one master. That will take care of this issue.
i thing this is a pretty specific issue for a small set of users. but if you can apply a patch it would be awsome :+1:
Can you try a dev-release off of origin/master? ce8057c should fix this for you. If so, I'll go ahead and cut a new final version.
postgres(0) now works but 1 fails
[repl:slave] DATADIR (/var/vcap/store/postgres/db) not found; seeding from preferred master (172.21.42.167)
pg_basebackup: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 0)
pg_ctl: directory "/var/vcap/store/postgres/db" does not exist
the only thing in my postgres.conf is
# postgres main configuration
port = 6432
listen_addresses = '*'
hot_standby = 'on'
Oops. Missed a "does my IP == master" in the postgresql.conf. Try now? (commit db6a5cb has the fix for max_wal_senders)
deployment is now successful :+1:
Awesome. I'm going to hold off on cutting a new version, since I'm still working through some smoke-test issues and we still aren't at v1 :smiley:
just tried your release but i get this strange error
template i use http://pastebin.com/vv0n9Zxw
it fails on the first node (master)