Open dlebauer opened 9 years ago
I agree that there's a lt of test runs that can be deleted, which is why we explicitly added a delete button in the history table. Moving forward folks need to do a better job of clearing out unsuccessful runs. Looking backward, it would be good to have a script that can clear out unsuccessful runs on ones own server that are older than some specified date (one might want to keep recent unsuccessful runs for debugging). Clearing out older successful but unneeded runs is harder to automate. I'd recommend Rob's delete system over database hacking because it knows about dependencies and also removes the underlying files, which would otherwise be orphaned
I suspect it wouldn't be hard to add an optional flag to load.db to not read the provenance tables, but this shouldn't be the default and we should continue to always write those
This issue is stale because it has been open 365 days with no activity.
@infotroph mentioned in slack that we should look at runs that are never started/finished and are more than a few days old. This should remove a bunch of them.
just FYI, about 13% of the runs are not ever started.
bety=# select count(*) from runs where started_at is null or finished_at is null;
count
--------
389218
(1 row)
bety=# select count(*) from runs;
count
---------
2921943
(1 row)
There are also a few that run in negative time:
select count(id) from runs where started_at>finished_at;
count
-------
942
(1 row)
I'm guessing, but can't prove, that these are mostly timezone errors plus a few cases of swapped order when passing started_at
and finished_at
as unnamed arguments.
This issue is stale because it has been open 365 days with no activity.
This would be nice to have as part of a pecan admin interface:
This would be nice to have as part of a pecan admin interface:
@robkooper first as API endpoint?
R got hung on
read.settings
. This was resolved by runningdelete from runs where id < 10000
, deleting the first 10k rows of runs.I went through and deleted a few hundreds of thousands more rows (e.g.
delete from likelihoods where id > 1000000; delete from runs where id > 1000000
, etc.Not only did the database get hung, but having so many extra rows makes it take longer to sync the database.
I'd propose a few solutions: