Closed sergei-maertens closed 1 year ago
Hi @sergei-maertens - which Barman version are you upgrading from?
Could also run barman check srv1-regex-it-nl-pg14
and barman diagnose
and post the output here, after removing any sensitive information from the output?
Are there any errors in the barman.log
file?
hi @mikewallace1979 - thanks for getting back so quickly
which Barman version are you upgrading from?
Checked my apt logs, and this was an upgrade from 3.6 to 3.9:
barman:amd64 (3.6.0-1.pgdg20.04+1, 3.9.0-1.pgdg20.04+1)
Could also run
barman check srv1-regex-it-nl-pg14
andbarman diagnose
and post the output here, after removing any sensitive information from the output?
So the check
I've run before, and it has the same hanging problem:
root@backups:~# barman check srv1-regex-it-nl-pg14
Server srv1-regex-it-nl-pg14:
(hitting CTRL+C
also doesn't have an immediate effect with any of these commands and opening a new SSH connection is required)
barman diagnose
also hangs and does not provide any output.
Are there any errors in the
barman.log
file?
I'm seeing some new suspicious records now actually:
2023-10-16 12:58:02,664 [120211] barman.wal_archiver INFO: No xlog segments found from streaming for pluksla-regex-it-nl-pg15.
2023-10-16 12:58:38,814 [85504] barman.command_wrappers INFO: geralt-modelbrouwers-nl-pg13: pg_receivewal: finished segment at 1B/B9000000 (timeline 1)
2023-10-16 12:59:01,866 [120232] barman.wal_archiver INFO: No xlog segments found from streaming for pluksla-regex-it-nl-pg15.
2023-10-16 12:59:02,002 [120231] barman.wal_archiver INFO: Found 1 xlog segments from streaming for geralt-modelbrouwers-nl-pg13. Archive all segments in one run.
2023-10-16 12:59:02,002 [120231] barman.wal_archiver INFO: Archiving segment 1 of 1 from streaming: geralt-modelbrouwers-nl-pg13/000000010000001B000000B8
2023-10-16 12:59:02,023 [120233] barman.wal_archiver INFO: No xlog segments found from streaming for srv1-regex-it-nl-pg14.
...
2023-10-16 13:00:08,422 [120204] barman.cli ERROR: Process interrupted by user (KeyboardInterrupt)
2023-10-16 13:01:02,475 [120396] barman.wal_archiver INFO: No xlog segments found from streaming for pluksla-regex-it-nl-pg15.
...
2023-10-16 13:05:02,382 [120484] barman.wal_archiver INFO: No xlog segments found from streaming for srv1-regex-it-nl-pg14.
2023-10-16 13:05:40,190 [120453] barman.server INFO: Check command timed out executing 'PostgreSQL' check
2023-10-16 13:05:40,190 [120453] barman.server ERROR: Check 'check timeout' failed for server 'srv1-regex-it-nl-pg14'
2023-10-16 13:05:40,192 [120453] barman.server ERROR: Impossible to start the backup. Check the log for more details, or run 'barman check srv1-regex-it-nl-pg14'
...
2023-10-16 14:13:36,994 [122211] Command WARNING: No LSB modules are available.
2023-10-16 14:13:37,028 [122211] Command WARNING: Python 2.7.18
2023-10-16 14:13:37,056 [122211] Command WARNING: OpenSSH_8.2p1 Ubuntu-4ubuntu0.9, OpenSSL 1.1.1f 31 Mar 2020
The ellipses are truncated "no xlog segments found..." records which are normal behaviour I believe, there's not a lot of activity on these databases.
edit: the timeout made me check if I can open a telnet connection and I see it's trying to connect over ipv6. Over the weekend I set up DNS for ipv6 so that is probably affecting things - and the remote PG server firewall only allows ipv4. So the problem is most likely on my end :grimacing:
edit2: pg_isready is fine though, and doesn't appear to try to connect over ipv6:
root@backups:~# pg_isready -p 5432 -h srv1.regex-it.nl
srv1.regex-it.nl:5432 - accepting connections
The timeout while connecting to PostgreSQL does seem to be the most likely reason for the failure.
Barman uses the psycopg2 library to connect to PostgreSQL which is not used by pg_isready. There is a report of psycopg2 waiting 127 seconds when attempting an ipv6 connection before falling back to ipv4 - I can't find any reference to this in the psycopg2 github repo, however if this is how psycopg2 behaves then it would explain why the default Barman check timeout of 30 seconds is exceeded in your setup where ipv6 is firewalled and ipv4 isn't.
I haven't been able to verify this report but it might be worth taking a closer look at the ipv6/ipv4 hypothesis - as a test you could try replacing the hostname with the IP4 address in the conninfo string in the Barman config.
Yes, that was indeed my plan. I have to get back to $dayjob now, will report back later. Luckily I'm familiar with psycopg2 so I can dig around too to check if there's a way to force ipv4 if the cause is confirmed.
Confirmed that it works as expected with direct ipv4 IP address as host. I can't find a (documented) way to force ipv4 from a host name, so the resolution will be to either configure the server with the IP address or open up the firewall to accept ipv6 connections.
psycopg2 uses libpq under the hood from what I've gathered, and that doesn't seem to document any such options. Either way, out of scope for barman, I'd say.
Hi - my barman installation appears to not be properly working anymore after I updated packages on my OS.
I received the following cron output causing me to investigate:
My environment:
I've updated my system (
apt-get dist-upgrade
) over the weekend and that pulled in the 2-weeks ago released barman 3.9.0.When I run
barman status all
, the output of the command hangs too:I had a working setup before, with 3 clusters being managed by barman.
Do you have any pointers of what could be wrong or what else I can investigate?