Closed dbutenhof closed 7 months ago
FYI:
I faked broken metadata by using psql
to delete some server
and metalog
rows:
|| Missing MD5 /srv/pbench/archive/fs-version-001/ansible-host/45f0e2af41977b89e07bae4303dc9972/pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18.tar.xz.md5
|| Isolator directory /srv/pbench/archive/fs-version-001/dhcp31-45.perf.lab.eng.bos.redhat.com/08516cc7448035be2cc502f0517783fa contains multiple tarballs: ['/srv/pbench/archive/fs-version-001/dhcp31-45.perf.lab.eng.bos.redhat.com/08516cc7448035be2cc502f0517783fa/fio_rw_2018.02.01T22.40.57.tar.xz', '/srv/pbench/archive/fs-version-001/dhcp31-45.perf.lab.eng.bos.redhat.com/08516cc7448035be2cc502f0517783fa/fio_mock_2020.02.27T22.16.14.tar.xz']
(16:01:28) Found ['/srv/pbench/archive/fs-version-001/dhcp31-45.perf.lab.eng.bos.redhat.com/08516cc7448035be2cc502f0517783fa/fio_rw_2018.02.01T22.40.57.tar.xz', '/srv/pbench/archive/fs-version-001/dhcp31-45.perf.lab.eng.bos.redhat.com/08516cc7448035be2cc502f0517783fa/fio_mock_2020.02.27T22.16.14.tar.xz'] for ID 08516cc7448035be2cc502f0517783fa
|| fio_rw_2018.02.01T22.40.57 has no server.tarball-path: setting /srv/pbench/archive/fs-version-001/dhcp31-45.perf.lab.eng.bos.redhat.com/08516cc7448035be2cc502f0517783fa/fio_rw_2018.02.01T22.40.57.tar.xz
|| fio_rw_2018.02.01T22.40.57 has no metalog: setting from metadata.log
|| fio_rw_2018.02.01T22.40.57 server.deletion set (730 days) to 2026-03-12T15:20:34.380181+00:00
|| fio_rw_2018.02.01T22.40.57 has no server.benchmark: setting 'fio'
(16:01:29) Found /srv/pbench/archive/fs-version-001/dhcp31-44.perf.lab.eng.bos.redhat.com/22a4bc5748b920c6ce271eb68f08d91c/fio_rw_2018.02.01T22.40.57.tar.xz for ID 22a4bc5748b920c6ce271eb68f08d91c
|| fio_rw_2018.02.01T22.40.57 has no server.tarball-path: setting /srv/pbench/archive/fs-version-001/dhcp31-44.perf.lab.eng.bos.redhat.com/22a4bc5748b920c6ce271eb68f08d91c/fio_rw_2018.02.01T22.40.57.tar.xz
|| fio_rw_2018.02.01T22.40.57 has no metalog: setting from metadata.log
|| fio_rw_2018.02.01T22.40.57 server.deletion set (730 days) to 2026-03-12T15:20:33.301420+00:00
|| fio_rw_2018.02.01T22.40.57 has no server.benchmark: setting 'fio'
|| Missing MD5 /srv/pbench/archive/fs-version-001/ansible-host/45f0e2af41977b89e07bae4303dc9972/pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18.tar.xz.md5
|| Isolated tarball /srv/pbench/archive/fs-version-001/ansible-host/45f0e2af41977b89e07bae4303dc9972/pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18.tar.xz MD5 doesn't match isolator 45f0e2af41977b89e07bae4303dc9972
|| pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18 doesn't seem to have a tarball
|| pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18 has no metalog: setting from default
|| pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18 server.deletion set (730 days) to 2026-03-12T15:20:33.441340+00:00
|| pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18 has no server.benchmark: setting 'unknown'
|| Missing MD5 /srv/pbench/archive/fs-version-001/ansible-host/45f0e2af41977b89e07bae4303dc9972/pbench-user-benchmark_example-vmstat_2018.10.24T14.38.18.tar.xz.md5
(16:01:29) Found /srv/pbench/archive/fs-version-001/rhel8-1/4b8da5832aa9c7c6a21dc74123b8968b/uperf_rhel8.1_4.18.0-107.el8_snap4_25gb_virt_2019.06.21T01.28.57.tar.xz for ID 4b8da5832aa9c7c6a21dc74123b8968b
|| uperf_rhel8.1_4.18.0-107.el8_snap4_25gb_virt_2019.06.21T01.28.57 has no server.tarball-path: setting /srv/pbench/archive/fs-version-001/rhel8-1/4b8da5832aa9c7c6a21dc74123b8968b/uperf_rhel8.1_4.18.0-107.el8_snap4_25gb_virt_2019.06.21T01.28.57.tar.xz
|| uperf_rhel8.1_4.18.0-107.el8_snap4_25gb_virt_2019.06.21T01.28.57 has no metalog: setting from metadata.log
|| uperf_rhel8.1_4.18.0-107.el8_snap4_25gb_virt_2019.06.21T01.28.57 server.deletion set (730 days) to 2026-03-12T15:20:33.609509+00:00
|| uperf_rhel8.1_4.18.0-107.el8_snap4_25gb_virt_2019.06.21T01.28.57 has no server.benchmark: setting 'uperf'
4 server.tarball-path repairs, 1 failures
4 server.deletion repairs, 0 failures
4 dataset.metalog repairs, 0 failures
4 server.benchmark repairs
This fixes several issues observed during ops review:
/api/v1/endpoints
API fails if the server is shut downtar
unpack errors can result in enormousstderr
output, which is captured in theAudit
log; truncate it to 5Kbpbench-audit
utility to usedateutil.parser
instead ofclick.DateTime()
so we can include fractional seconds and timezone.During the time when we broke PostgreSQL, we failed to create metadata for a number of datasets that were allowed to upload. (Whether we should allow this vs failing the upload is a separate issue.) We have want to repair the excessively large
Audit
attributes records. So I took a stab at some wondrous and magical SQL queries and hackery to begin a newpbench-repair
utility. Right now, it repairs long audit attributes "intelligently" by trimming individual JSON key values; and it add metadata to datasets which lack critical values. Currently, this includesserver.tarball-path
(which we need to enable TOC and visualization),dataset.metalog
(capturing the tarballmetadata.log
file), andserver.benchmark
for visualization.There are other
server
namespace values (including expiration time) that could be repaired: I decided not to worry about that as we're not doing expiration anyway. (Though I might add it over the weekend, since it shouldn't be hard.) And there are probably other things we might want to repair in the future using this framework.I tested this in a
runlocal
container, usingpsql
to "break" datasets and repair them. I hacked the localrepair.py
with a low "max error" limit to force truncation of audit attributes: