EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.08k stars 192 forks source link

Barman stop WAL archiving after creating empty file #962

Open ch9hn opened 3 months ago

ch9hn commented 3 months ago

We are running into an issue with the archiving of WAL files with cloudnative-pg. In the very beginning Barman writes WAL files to the S3 Object Bucket and is then creating a plain file with the folder name inside the Bucket. After that no write happens again to the Bucket.

We are getting the following errors:

{"level":"info","ts":"2024-07-16T08:51:49Z","logger":"wal-archive","msg":"Failed archiving WAL: PostgreSQL will retry","logging_pod":"airflow-pg-1","walName":"pg_wal/00000006.history","startTime":"2024-07-16T08:51:48Z","endTime":"2024-07-16T08:51:49Z","elapsedWalTime":0.304129655,"error":"unexpected failure invoking barman-cloud-wal-archive: exit status 4"}
{"level":"info","ts":"2024-07-16T08:51:50Z","logger":"barman-cloud-wal-archive","msg":"2024-07-16 08:51:50,201 [80291] WARNING: Failed to parse headers (url=https://xxxxxx.s3.fr-par.scw.cloud:443/scw-cnpg-dev-stg-testing/airflow-pg/wals/0000000300000000/000000030000000000000010.partial.gz): [MissingHeaderBodySeparatorDefect()], unparsed data: 'HTTP/1.1 400 Bad request\\r\\nContent-length: 90\\r\\nCache-Control: no-cache\\r\\nConnection: close\\r\\nContent-Type: text/html\\r\\n\\r\\n'","pipe":"stderr","logging_pod":"airflow-pg-1"}
{"level":"info","ts":"2024-07-16T08:51:50Z","logger":"barman-cloud-wal-archive","msg":"Traceback (most recent call last):","pipe":"stderr","logging_pod":"airflow-pg-1"}
{"level":"info","ts":"2024-07-16T08:51:50Z","logger":"barman-cloud-wal-archive","msg":"  File \"/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py\", line 487, in _make_request","pipe":"stderr","logging_pod":"airflow-pg-1"}
{"level":"info","ts":"2024-07-16T08:51:50Z","logger":"barman-cloud-wal-archive","msg":"    assert_header_parsing(httplib_response.msg)","pipe":"stderr","logging_pod":"airflow-pg-1"}
{"level":"info","ts":"2024-07-16T08:51:50Z","logger":"barman-cloud-wal-archive","msg":"  File \"/usr/local/lib/python3.9/dist-packages/urllib3/util/response.py\", line 91, in assert_header_parsing","pipe":"stderr","logging_pod":"airflow-pg-1"}
{"level":"info","ts":"2024-07-16T08:51:50Z","logger":"barman-cloud-wal-archive","msg":"    raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)","pipe":"stderr","logging_pod":"airflow-pg-1"}
{"level":"info","ts":"2024-07-16T08:51:50Z","logger":"barman-cloud-wal-archive","msg":"urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data: 'HTTP/1.1 400 Bad request\\r\\nContent-length: 90\\r\\nCache-Control: no-cache\\r\\nConnection: close\\r\\nContent-Type: text/html\\r\\n\\r\\n'","pipe":"stderr","logging_pod":"airflow-pg-1"}
{"level":"info","ts":"2024-07-16T08:51:50Z","logger":"barman-cloud-wal-archive","msg":"2024-07-16 08:51:50,204 [80291] ERROR: Barman cloud WAL archiver exception: An error occurred () when calling the PutObject operation: ","pipe":"stderr","logging_pod":"airflow-pg-1"}

Before delete of the file: image image

After delete: image image As you see, a new WAL file is written and after that the write stops again.

gcalacoci commented 3 months ago

@ch9hn to be sure, this is not happening on AWS but on Scaleway right?

ch9hn commented 3 months ago

Hello, we tested it with another service like Cubbit https://www.cubbit.io/ and it works with that.

gcalacoci commented 3 months ago

@ch9hn seems to me that this is a compatibility issue on the Scaleway side. Barman uses Boto3 lib to connect to Amazon s3 or s3-compatible services, as you noticed some behave on their side exactly as s3, like the one you mentioned (cubbit.io) but also others like Wasabi, Tebi.io etc...

Some others like Scaleway have a different behavior on their side to certain calls, so they are not fully compatible.

I went through the wal-archive code and Barman doesn't do anything specific or strange:

Somehow during this sequence of actions, on the Scaleway side an empty file with the same name as the server name (at least this I can see on your images) is created, disrupting the upload in the correct directory.

here is the call chain:

Seems to me that Scaleway handles the upload path differently from Amazon s3.

Please note that Barman officially supports Amazon s3 only and is not possible for us to test Barman on all the s3-compatible services (that are, unfortunately, a lot).

ch9hn commented 3 months ago

Thank you very much for that extensive explaination. Really appreciate it. We will check with our provider.

Update: We opened a ticket at Scaleway and they are investigating it. As its a really common library, we will escalate this further so that it works.

cc: @ivanwel