Open grifferz opened 2 years ago
On advice of dmsimard in IRC I tried:
ara task list --playbook 253 -f value -c id | xargs -L1 ara task delete
This completed without complaint, so I assume all tasks from playbook id 253 have now been deleted but I still cannot run prune without hitting the foreign key constraint error with playbook id 253.
Is there an easy way, from the CLI client, to make it report the exact SQL queries that are being issued so I can at least see which query violates the constraint?
o/ thanks for the update @grifferz
This completed without complaint, so I assume all tasks from playbook id 253 have now been deleted but I still cannot run prune without hitting the foreign key constraint error with playbook id 253.
Tasks aren't the only resources for a playbook, though. Could you also run these and find out if any return an error:
ara play list --playbook 253 -f value -c id | xargs -L1 ara play delete
ara host list --playbook 253 -f value -c id | xargs -L1 ara host delete
The reason I am interested in running these commands is to identify exactly which resource from the playbook is "stuck". In addition to those there are also files but files do not have a CLI implementation. If we still can't delete the playbook after deleting all other resources, it will at least hint us in the right direction.
Is there an easy way, from the CLI client, to make it report the exact SQL queries that are being issued so I can at least see which query violates the constraint?
There is nothing built-in right now, no.
Could you also run these and find out if any return an error:
* `ara play list --playbook 253 -f value -c id | xargs -L1 ara play delete` * `ara host list --playbook 253 -f value -c id | xargs -L1 ara host delete`
These both ran without error, but still hitting the constraint violation for playbook id 253 afterwards.
In addition to those there are also files but files do not have a CLI implementation. If we still can't delete the playbook after deleting all other resources, it will at least hint us in the right direction.
So perhaps it is files then. I'm not particularly interested in having ARA record anything about files. Is there a way to have it skip doing that?
Actually there has been a development.
Although the deletion of plays ran without error:
ara play list --playbook 253 -f value -c id | xargs -L1 ara play delete
…the first run of "prune" after that actually resulted in a different error which I did not notice at first:
$ ara playbook prune --days 1 --confirm
2021-09-26 23:09:03,200 INFO ara.cli.playbook: Found 125 playbooks matching query
2021-09-26 23:09:03,200 INFO ara.cli.playbook: Deleting playbook 253 (/srv/ansible/playbooks/infra/site.yml), start date: 2021-09-20T21:32:04.794418Z
2021-09-26 23:09:03,277 ERROR django.request: Internal Server Error: /api/v1/playbooks/253
Traceback (most recent call last):
File "/home/andy/venv/ansible-py3/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/home/andy/venv/ansible-py3/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 383, in execute
return Database.Cursor.execute(self, query, params)
sqlite3.DatabaseError: database disk image is malformed
The subsequent deletion of hosts completed without error, and ansible-playbook was still recording playbook runs in there, but prune now only got this "database disk image is malformed" error.
On investigation:
sqlite> pragma integrity_check;
wrong # of entries in index tasks_file_id_fc41d624
sqlite> .schema tasks
CREATE TABLE IF NOT EXISTS "tasks" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "created" datetime NOT NULL, "updated" datetime NOT NULL, "started" datetime NOT NULL, "ended" datetime NULL, "name" text NULL, "action" text NOT NULL, "lineno" integer NOT NULL, "tags" BLOB NOT NULL, "handler" bool NOT NULL, "file_id" bigint NOT NULL REFERENCES "files" ("id") DEFERRABLE INITIALLY DEFERRED, "play_id" bigint NOT NULL REFERENCES "plays" ("id") DEFERRABLE INITIALLY DEFERRED, "playbook_id" bigint NOT NULL REFERENCES "playbooks" ("id") DEFERRABLE INITIALLY DEFERRED, "duration" bigint NULL, "status" varchar(25) NOT NULL);
CREATE INDEX "tasks_file_id_fc41d624" ON "tasks" ("file_id");
CREATE INDEX "tasks_play_id_8b47a0f3" ON "tasks" ("play_id");
CREATE INDEX "tasks_playbook_id_0f283c3b" ON "tasks" ("playbook_id");
sqlite> drop index "tasks_file_id_fc41d624";
sqlite> CREATE INDEX "tasks_file_id_fc41d624" ON "tasks" ("file_id");
sqlite> pragma integrity_check;
ok
…and now "prune" runs without complaint - it's managed to prune playbook id 253 as well as all the others that it should.
I still have a copy of the database from before I ran any of the delete commands if you want me to investigate anything else.
I've never seen errors like these so it would be interesting to understand what caused them but thanks for providing insight into a potential fix though we should ideally never have to tinker with the database manually :)
I wonder: if you do a prama integrity_check;
on a copy of your backup, does it pick up any issue ?
Does anything else about playbook 253 feel different or special ?
I have the following in a daily cron job:
After a few days it has hit a playbook that it can't prune because it gets a foreign key constraint error, so now the cron job can never get past pruning that playbook:
ara version 1.5.7.
Aside from avoiding hitting that problem, could it also be an idea to carry on and prune other playbooks even if there is an error pruning one of them?