Lucretius / vault_raft_snapshot_agent

⛔️ DEPRECATED ⛔️ An agent which provides periodic snapshotting capabilities of Vault's Raft backend
MIT License
78 stars 42 forks source link

Majority of aws snaps fail to upload, none delete. #8

Open mooneye14 opened 3 years ago

mooneye14 commented 3 years ago

I'm running the service as described here using an on-prem S3 service(not minio) as the aws endpoint. I was seeing tons of failed snaps and it appears to be in the aws s3manager sdk Upload portion. I was initially doing a 30s increment, thought that might be too much, backed it off to 30m and it is still failing 9/10 times to upload. I also see no logs related to failed deletion even though my retain is set to 1440 and there are currently 33000+ in there after a month.

logs collected with :'sudo journalctl -u vault-snapshot.service'

2021/04/12 08:49:52 Failed to generate aws snapshot to : MultipartUpload: upload multipart failed Apr 12 08:49:52 host.local vault_raft_snapshot_agent[114297]: upload id: 22478197652 Apr 12 08:49:52 host.local vault_raft_snapshot_agent[114297]: caused by: InternalError: Apr 12 08:49:52 host.local vault_raft_snapshot_agent[114297]: status code: 500, request id: , host id:

Lucretius commented 3 years ago

Hi there,

Interesting that it sounds like it does upload some of the time. Does it work just once and then fail all subsequent ones? Does restarting the service cause it to temporarily start working again?

I definitely have not toyed around with dealing with an on-prem S3 service but I would imagine it wouldn't be any different aside from some unique networking.