Closed bacongobbler closed 8 years ago
So I found the core issue:
STRUCTURED: time=2016-03-16T16:14:27.012096-00 pid=377
wal_e.retries WARNING MSG: retrying after encountering exception
DETAIL: Exception information dump:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/wal_e/retries.py", line 62, in shim
return f(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/wal_e/worker/s3/s3_deleter.py", line 17, in _delete_batch
bucket_name = page[0].bucket.name
AttributeError: 's3.ObjectSummary' object has no attribute 'bucket'
HINT: A better error message should be written to handle this exception. Please report this output and, if possible, the situation under which it arises.
It turns out that the image on CI does not have the latest changes from deis/wal-e#3. Busting the cache by adding --no-cache
to docker build
fixes this, but we should probably tag to a commit instead of a branch so docker's cache will bust when we implement these changes upstream.
That was related but this is still occurring on master.
this is a bug but not a core issue. One of the older releases are being stubborn when getting removed from minio so I'm going to remove this from showstopper. It's not a significant issue that affects the platform and we could potentially ship beta with this bug.
this popped up again in CI: https://travis-ci.org/deis/postgres/builds/122642210
Another sighting: https://travis-ci.org/deis/postgres/builds/123393663
Unfortunately as soon as we restart the job to go green, the old build goes away. Anyhoo this is a problem but it doesn't seem like it's a core issue; just a CI/delay issue.
Here is a failure in Travis CI with logging from #102:
pg_ctl: server is running (PID: 1)
/usr/lib/postgresql/9.4/bin/postgres
-----> checking if minio has 5 backups
!!! did not find 5 base backups, which is the default (found 6)
!!! base_00000001000000000000000D_00000040_backup_stop_sentinel.json
base_00000001000000000000000E_00000040_backup_stop_sentinel.json
base_00000001000000000000000F_00000040_backup_stop_sentinel.json
base_000000010000000000000010_00000040_backup_stop_sentinel.json
base_000000010000000000000011_00000040_backup_stop_sentinel.json
base_000000010000000000000012_00000040_backup_stop_sentinel.json
make: *** [test-functional] Error 1
The very next test run against the same master commit passed.
It appears the proposed fix in #102 unfortunately didn't completely address the issue; therefore moving this the v2.0
milestone
No #102 was just to expose what the issue is, which is exactly what I assumed (new base backup, old one wasn't deleted due to sync issues). This is a low priority fix because it's not a massive issue nor does it cause any damage to the database. Just a little lag along with checking right in the middle of a backup operation is all. Perhaps a fix would be to stop the database and check the number of backups retained, as the database should shut down gracefully with only 5 backups after the backup has been pushed to minio.
see https://github.com/deis/postgres/pull/65#issuecomment-196917516