Track active tasks' data separately from "archived" ones'

We currently keep track of all tasks known to a coordinator in the TaskMap_t data structure owned by the Coordinator. This contains tasks in new, runnable, running, completed, failed and various other states. We use it for the web UI, scheduling and the management of task-specific data structures.

However, the flow graph (and, consequently, the cost models) sometimes needs to iterate over all tasks that are currently of interest to the scheduler (i.e., those which are still eligible for scheduling: runnable, running and failed ones), and can get tripped up by "archived" tasks that are still in the task map.

In order to increase the efficiency of such iterations and clear up the semantics, we should de-conflate the two purposes of the task map. There are several options for this:

Establish a separate data structure in the flow scheduler that keeps track of all tasks that are of interest to it.
- Pros: easy, not a breaking change, compatible with factoring the flow scheduler into a standalone module
- Cons: duplication of bookkeeping, need to manage another data structure, memory overhead
Re-designate the task map to only contain active tasks, and have an archival map for those that are no longer active.
- Pros: no memory overhead, clear separation of concerns
- Cons: major architectural change, need to still manage two data structures, potential for inconsistency
Garbage-collect finished tasks' state at some time after they finish (as in Mesos), and retire any information we want to retain to the knowledge base. --
- Pros: clean solution, also addresses state accumulation issues, clear separation of concerns
- Cons: invasive change that touches assumptions, needs state migration logic

Interested in views on what the best way forward is.

camsas / firmament

Track active tasks' data separately from "archived" ones' #24