NDCMS / lobster

A userspace workflow management tool for harnessing non-dedicated resources for high-throughput workloads.
MIT License
3 stars 14 forks source link

Deduplicate units stuck #593

Closed annawoodard closed 7 years ago

annawoodard commented 7 years ago

Under some conditions a unit can satisfy both being failed and being skipped. In those cases we double-count how many units are stuck (this error was introduced by me in c98a5bb). This commit reverts back to calculating the union of stuck and failed rather than the sum. It also changes the definition of 'stuck' given in workflow summary to be in line with what we use under-the-hood (so for users, stuck goes from "parent stuck" to "parent stuck or skipped or failed". I think the former use is more straightforward for users, but maintaining a separate definition is too confusing. Fixes #592.

To continue reporting the units failed and units stuck every time we run update_workflow_stats, we'd need to compute both the units failed, units skipped, and the union of those lists. To keep the computation lean, I went back to only calculating the union (and removing the units_failed and units_skipped columns from the workflow table). Now units_failed and units_skipped is only calculated when we call workflow_status (during plotting and when running lobster status). This isn't a complete reversion of c98a5bb though, there are still some minor improvements in readability.

The new method is slightly slower than before, but should be equivalent to the speed pre-c98a5bb:

In [1]: from lobster.core import unit
In [2]: import imp
In [3]: cfg = imp.load_source('cfg', 'config.py').config
In [4]: store = unit.UnitStore(cfg)

NEW:

In [5]: %timeit store.update_workflow_stats('ttW_extn')
10 loops, best of 3: 26.9 ms per loop

OLD:

In [6]: %autoreload
In [7]: %timeit store.update_workflow_stats('ttW_extn')
10 loops, best of 3: 23 ms per loop

NEW: screen shot 2017-05-27 at 1 22 26 pm

OLD: screen shot 2017-05-27 at 12 01 40 am