EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.07k stars 191 forks source link

"unexpected failure invoking barman-cloud-wal-archive: exit status 4" #925

Closed AllardKrings closed 5 months ago

AllardKrings commented 5 months ago

hi,

i am running cnpg in combination with minio on an arm sbc with ubuntu 23.10 and microk8s 1.29.

my cluster is defined as:

apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: postgres13 namespace: postgres spec: instances: 3 imageName: ghcr.io/cloudnative-pg/postgresql:13.14-3 bootstrap: initdb: postInitSQL:

Minio runs in the same namespace:

NAME READY STATUS RESTARTS AGE minio-547b5c995b-6dwm5 1/1 Running 2 (5h28m ago) 25h postgres13-1 1/1 Running 0 3h20m postgres13-2 1/1 Running 0 3h19m postgres13-3 1/1 Running 0 3h19m

I creates a bucket “backups” in minio.

When I run a backup:

apiVersion: postgresql.cnpg.io/v1 kind: Backup metadata: name: postgres13-backup namespace: postgres spec: cluster: name: postgres13

It gives an error:

Name: postgres13-backup Namespace: postgres Labels: Annotations: API Version: postgresql.cnpg.io/v1 Kind: Backup Metadata: Creation Timestamp: 2024-04-25T07:29:24Z Generation: 1 Resource Version: 3755479 UID: ea5ef0c7-098b-4c23-870a-d63b8c65a63b Spec: Cluster: Name: postgres13 Method: barmanObjectStore Status: Backup Name: backup-20240425072926 Destination Path: s3://backups/ Endpoint URL: http://minio.postgres:9000 Instance ID: Container ID: containerd://86707c09825439d602b3200030f313be4e42d02a461bbfc10bea501900058573 Pod Name: postgres13-2 Method: barmanObjectStore Phase: walArchivingFailing s3Credentials: Access Key Id: Key: MINIO_ACCESS_KEY Name: minio-creds Secret Access Key: Key: MINIO_SECRET_KEY Name: minio-creds Server Name: postgres13 Events:

The log of postgres13-1 pod says:

{"level":"error","ts":"2024-04-25T10:50:09Z","logger":"wal-archive","msg":"failed to run wal-archive command","logging_pod":"postgres13-1","error":"unexpected failure invoking barman-cloud-wal-archive: exit status 4","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(logger).Error\n\tpkg/management/log/log.go:128\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/walarchive.NewCmd.func1\n\tinternal/cmd/manager/walarchive/cmd.go:95\ngithub.com/spf13/cobra.(Command).execute\n\tpkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(Command).ExecuteC\n\tpkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(Command).Execute\n\tpkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\tcmd/manager/main.go:64\nruntime.main\n\t/opt/hostedtoolcache/go/1.21.5/x64/src/runtime/proc.go:267"} {"level":"info","ts":"2024-04-25T10:50:09Z","logger":"postgres","msg":"record","logging_pod":"postgres13-1","record":{"log_time":"2024-04-25 10:50:09.867 UTC","process_id":"29","session_id":"662a056c.1d","session_line_num":"759","session_start_time":"2024-04-25 07:25:32 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"archive command failed with exit code 1","detail":"The failed archive command was: /controller/manager wal-archive --log-destination /controller/log/postgres.json pg_wal/000000010000000000000001","backend_type":"archiver"}} {"level":"info","ts":"2024-04-25T10:50:09Z","logger":"postgres","msg":"record","logging_pod":"postgres13-1","record":{"log_time":"2024-04-25 10:50:09.867 UTC","process_id":"29","session_id":"662a056c.1d","session_line_num":"760","session_start_time":"2024-04-25 07:25:32 UTC","transaction_id":"0","error_severity":"WARNING","sql_state_code":"01000","message":"archiving write-ahead log file \"000000010000000000000001\" failed too many times, will try again later","backend_type":"archiver"}}

What am I doing wrong?

Help apprectated!

srekkas commented 4 months ago

what was problem cause?

AllardKrings commented 4 months ago

I am afraid I cannot tell you. The problem disappeared spontaneously.

ch9hn commented 2 months ago

We are running into the exact same issue. In the very beginning Barman writes WAL files to the S3 Object Bucket and is then creating a plane file with the folder name inside the Bucket. After that no write happens again to the Bucket.

Before delete of the file: image image

After delete: image image As you see, a new WAL file is written and after that the write stops again.