better handling of 'test' runs

PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.

www.pecanproject.org

Other

202 stars 234 forks source link

better handling of 'test' runs #620

Open dlebauer opened 9 years ago

dlebauer commented 9 years ago

R got hung on read.settings. This was resolved by running delete from runs where id < 10000, deleting the first 10k rows of runs.

I went through and deleted a few hundreds of thousands more rows (e.g. delete from likelihoods where id > 1000000; delete from runs where id > 1000000, etc.

Not only did the database get hung, but having so many extra rows makes it take longer to sync the database.

I'd propose a few solutions:

a function to clean 'test' runs (requires tricky queries when foreign table constraints are violated)
users using 'test' databases except when doing 'analyses' (I agree this is difficult in practice to identify 'test' from 'real' runs; but see (1) to make clean up easier
not syncing the provenance tables (I generally don't need to know what is happening w/ PEcAn on other servers).

mdietze commented 9 years ago

I agree that there's a lt of test runs that can be deleted, which is why we explicitly added a delete button in the history table. Moving forward folks need to do a better job of clearing out unsuccessful runs. Looking backward, it would be good to have a script that can clear out unsuccessful runs on ones own server that are older than some specified date (one might want to keep recent unsuccessful runs for debugging). Clearing out older successful but unneeded runs is harder to automate. I'd recommend Rob's delete system over database hacking because it knows about dependencies and also removes the underlying files, which would otherwise be orphaned

I suspect it wouldn't be hard to add an optional flag to load.db to not read the provenance tables, but this shouldn't be the default and we should continue to always write those

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.

robkooper commented 4 years ago

@infotroph mentioned in slack that we should look at runs that are never started/finished and are more than a few days old. This should remove a bunch of them.

robkooper commented 4 years ago

just FYI, about 13% of the runs are not ever started.

bety=# select count(*) from runs where started_at is null or finished_at is null;
 count
--------
 389218
(1 row)

bety=# select count(*) from runs;
  count
---------
 2921943
(1 row)

infotroph commented 4 years ago

There are also a few that run in negative time:

select count(id) from runs where started_at>finished_at;
 count 
-------
   942
(1 row)

I'm guessing, but can't prove, that these are mostly timezone errors plus a few cases of swapped order when passing started_at and finished_at as unnamed arguments.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 365 days with no activity.

robkooper commented 3 years ago

This would be nice to have as part of a pecan admin interface:

button to delete runs on server that are created > 7 days ago, but have not started/finished, we can even couple this with a checkin rabbitmq to see if it is there.

dlebauer commented 3 years ago

This would be nice to have as part of a pecan admin interface:

@robkooper first as API endpoint?