Closed davissp14 closed 4 days ago
can we change the env vars to match the default fly tigris create
ones?
I'm seeing an error listing backups:
2024-07-03T06:07:43Z app[6833e41a77d908] ewr [info]panic: failed to restore base backup: failed to resolve backup target by time: failed to parse backup end time: parsing time "" as "Mon Jan 2 15:04:05 2006": cannot parse "" as "Mon"
2024-07-03T06:07:43Z app[6833e41a77d908] ewr [info]goroutine 1 [running]:
2024-07-03T06:07:43Z app[6833e41a77d908] ewr [info]main.panicHandler({0x9ba280?, 0xc000106340})
2024-07-03T06:07:43Z app[6833e41a77d908] ewr [info] /go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:198 +0x55
2024-07-03T06:07:43Z app[6833e41a77d908] ewr [info]main.main()
2024-07-03T06:07:43Z app[6833e41a77d908] ewr [info] /go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:75 +0xf50
here's the json:
postgres@6833e41a77d908:~$ barman-cloud-backup-list --cloud-provider aws-s3 --endpoint-url https://fly.storage.tigris.dev --profile restore --format json s3://dbdbdbbucket dbdbdb | python3 -m json.tool
{
"backups_list": [
{
"backup_label": null,
"begin_offset": 216,
"begin_time": "Tue Jul 2 21:05:44 2024",
"begin_wal": "00000005000000000000002A",
"begin_xlog": "0/2A0000D8",
"compression": null,
"config_file": "/data/postgresql/postgresql.conf",
"copy_stats": null,
"deduplicated_size": null,
"end_offset": null,
"end_time": null,
"end_wal": null,
"end_xlog": null,
"error": "failure uploading data (connection already closed)",
"hba_file": "/data/postgresql/pg_hba.conf",
"ident_file": "/data/postgresql/pg_ident.conf",
"included_files": [
"/data/postgresql/postgresql.auto.conf",
"/data/postgresql/postgresql.internal.conf"
],
"mode": null,
"pgdata": "/data/postgresql",
"server_name": "cloud",
"size": null,
"status": "FAILED",
"systemid": "7332222271544570189",
"tablespaces": null,
"timeline": 5,
"version": 150006,
"xlog_segment_size": 16777216,
"backup_id": "20240702T210544"
},
{
"backup_label": "'START WAL LOCATION: 0/2E000028 (file 00000005000000000000002E)\\nCHECKPOINT LOCATION: 0/2E000060\\nBACKUP METHOD: streamed\\nBACKUP FROM: primary\\nSTART TIME: 2024-07-02 21:07:03 UTC\\nLABEL: Barman backup cloud 20240702T210703\\nSTART TIMELINE: 5\\n'",
"begin_offset": 40,
"begin_time": "Tue Jul 2 21:07:03 2024",
"begin_wal": "00000005000000000000002E",
"begin_xlog": "0/2E000028",
"compression": null,
"config_file": "/data/postgresql/postgresql.conf",
"copy_stats": {
"total_time": 5.192484,
"number_of_workers": 2,
"analysis_time": 0,
"analysis_time_per_item": {
"data": 0
},
"copy_time_per_item": {
"data": 4.423057
},
"serialized_copy_time_per_item": {
"data": 2.484593
},
"copy_time": 4.423057,
"serialized_copy_time": 2.484593
},
"deduplicated_size": null,
"end_offset": 80,
"end_time": "Tue Jul 2 21:07:05 2024",
"end_wal": "00000005000000000000002F",
"end_xlog": "0/2F000050",
"error": null,
"hba_file": "/data/postgresql/pg_hba.conf",
"ident_file": "/data/postgresql/pg_ident.conf",
"included_files": [
"/data/postgresql/postgresql.auto.conf",
"/data/postgresql/postgresql.internal.conf"
],
"mode": null,
"pgdata": "/data/postgresql",
"server_name": "cloud",
"size": null,
"status": "DONE",
"systemid": "7332222271544570189",
"tablespaces": null,
"timeline": 5,
"version": 150006,
"xlog_segment_size": 16777216,
"backup_id": "20240702T210703"
}
]
}
What do the query params look like for the S3_ARCHIVE_REMOTE_RESTORE_CONFIG
secret?
EDIT
Oh, I see. The backup failed, so it doesn't have an end_time
. Good find.
UPDATE Issue should be fixed.
This PR introduces a WAL Archiving solution that integrates with any S3 compatible object storage. It also provides various remote recovery options for restoring into a new app. While in-place recovery options are possible with the provided functionality, the commands must be issued manually. The current automated recovery process will fail if there's an existing
postgresql
directory.Docs for in-place restores will be coming soon.
Enabling WAL Archiving
To enable, set the following secret:
This most straight forward way to get started would be to create a new Tigris bucket via:
Then set the secret with
fly secrets set S3_ARCHIVE_CONFIG=... --app <pg-app-name>
Viewing/Updating WAL Archiving Configuration
Definitions
recovery_window (default: 7d) Used as a retention policy. Backups older than the specified value will be pruned. The shortest allowed recovery window is
1d
.Units available:
minimum_redundancy (default 3) The minimum number of backups that should always be available. Must be
>= 0
full_backup_frequency (default: 24h) The frequency in which full backups are taken. Must be
>= 1h
archive_timeout (default: 60s) This is a limit on how old unarchived data can be. Archiving typically happens once the WAL segment is full, however, this can cause issues for less active databases. So if the WAL segment doesn't fill by the specified timeout, a wal-switch will occur and force an archive event.
WARNING - Having too short of a archive timeout can have performance implications and bloat object storage.
View your current configuration
First, you can view the existing settings by running the following curl command:
Update your configuration
To update a configuration option, you can run:
Then when we re-run the view command we can see the recovery window has been updated:
Note - This is a manual process, but the idea is that it make it straight forward to surface within a client and/or UI. These settings currently require a cluster restart, but in the future it should be possible to configure at runtime.
Remote Restore
Provision a new Postgres with the following secret:
S3_ARCHIVE_REMOTE_RESTORE_CONFIG=https://<access_key_id>:<access_secret>@<s3-endpoint>/<source-bucket-name>/<source-bucket-directory>
Note: If you don't specify a target it will default to restoring to the most recent point-in-time.
Target options
target Specifies that recovery should end as soon as a consistent state is reached. The only allowed value here is "immediate".
targetTime This parameter specifies the time stamp up to which recovery will proceed. The timestamp should follow RFC3339 formatting.
targetName This parameter specifies the named restore point.
targetLSN (Not implemented) This parameter specifies the LSN of the write-ahead log location up to which recovery will proceed.
TargetXID (Not implemented) This parameter specifies the transaction ID up to which recovery will proceed.
Additional target options
targetInclusive (default: true) Specifies whether to stop just after the specified recovery target (true), or just before the recovery target (false).
targetAction (default: promote) Specifies what action the server should take once the recovery target is reached
targetTimeline (default: current-timeline) Specifies recovering into a particular timeline. The default is to recover along the same timeline that was current when the base backup was taken.
Example usage
S3_ARCHIVE_REMOTE_RESTORE_CONFIG=https://<access_key_id>:<access_secret>@<s3-endpoint>/<source-bucket-name>/<source-bucket-directory>?targetTime=2024-07-03T20:23:46Z
For more information on these settings, see: https://www.postgresql.org/docs/current/runtime-config-wal.html#RUNTIME-CONFIG-WAL-RECOVERY-TARGET