Integrate Adam's changes (inc. simulator and simulated Quincy cost model)

This pull request contains the changes from @AdamGleave's Hapi project for review. We should aim to upstream these changes in a reasonably timely manner in order not to uphold some of the upcoming changes to code structure.

Apart from a bunch of fixes and improvements, this pull request contains three major components:

A simulated Quincy cost model that allows us to simulate locality in a distributed file system with the Google trace simulator.
Changes to support the Hapi min-cost flow solvers in Firmament (adding approximate and incremental scheduling support).
Changes to the Google trace simulator to run experiments and measure a variety of metrics when running approximate and incremental flow solvers.

I will aim to go through the commits and tick them off individually, adding comments and deltas as required; I don't think there's an easy way to make this into multiple pull requests, unfortunately.

[x] File: src/base/types.h:L113-133
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Only used in one place (simulated_quincy_cost_model). I would not define it in types if it's only used in one place.
[x] File: src/misc/Makefile:L6-14
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Spooky should probably go in externals.
[x] File: src/misc/SpookyV2.cc:L1-352
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Move to ext.
[x] File: src/misc/SpookyV2.h:L1-300
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Move to ext or figure out a better way of including a decent hash function.
[x] File: src/misc/utils.cc:L7-15
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Reorder.
[x] File: src/scheduling/cost_models/google_runtime_distribution.cc:L1-41
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> I wonder if we should move these distribution files to the simulator directory.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Agree, we should (though that might complicate linkage a bit).
[x] File: src/scheduling/cost_models/simulated_dfs.cc:L1-104
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Fix indentation.
[x] File: src/scheduling/cost_models/simulated_dfs.cc:L1-104
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Similarly, I would move this file to the simulator directory.
[x] File: src/scheduling/cost_models/simulated_dfs.cc:L1-104
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> I wonder if we should use an unordered_map here.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> What would the map be keyed by? The machine ID or the resource ID? If we use the machine, there is no benefit over using a vector, since the vector can be subscripted into at O(1) cost. If we use the resource ID, the placement code in GetMachines() will become more complex. I think I agree that we should change it, but let's quickly discuss the design here.
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> You're right, if we use an unordered_map then the GetMachines code will get quite tricky. We would have to keep two maps one mapping machine_index to resource_id and a reverse one. Overall, this may end up being slower than just using the vector. Definitely, GetMachines will be called more often than RemoveMachine. Let's keep it as it it.
[x] File: src/scheduling/cost_models/simulated_dfs.h:L1-59
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> A comment would help here. I'm not sure what the method is trying to do.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> I think it randomly picks input files covers num_blocks DFS blocks for a job. @AdamGleave should be able to confirm.
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Added comment to explain what the method does.
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.cc:L1-359
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Maybe we should create a class that stores all these options. Otherwise, we end up having this monstrous constructor.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> I agree, but suggest we leave that change to a later PR and introduce a uniform abstraction across cost models. Most of the non-trivial ones take similar state-tracking arguments.
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.cc:L1-359
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> I'm not sure what elt stands for.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> "element", at a guess. But I've replaced it with "mapping" everywhere, which is clearer.
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.h:L1-117
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Aren't we defining it is some other place already?
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> We currently define this in every cost model, but this is clearly silly. I will move it into a common header.
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.cc:L1-359
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> I am a bit surprised that these methods are not implemented. I think we should use them in order to adjust the costs of the preference arcs. We should gather the number of tasks that are running on a machine. We don't want to place N tasks on the same machine if they are all hitting hard the disk.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Adam didn't have time to add this functionality, I think. The original Quincy cost model (which we're implementing here) also has no such notion -- if a machine has a capacity for K tasks, it gets K tasks. (So this is faithful to the original model, though not necessarily a good idea.) Leave to later PR?
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.cc:L1-359
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> This covers the sw rack to machine case as well. Shouldn't the method return a non-zero value?
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> See other comment -- the rack aggregator support is a bit of a hack. I think we should revisit it, but merge it as-is.
[x] File: src/base/types.h:L113-133
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Likewise, I think these should be in a cost-model header, rather than in the global types header.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> It turns out that the simulated DFS class also uses this type. For now, we have two typedefs in different places for this, but that's clearly not ideal. We'll need to think about a shared header in the future. However the places that use this are in different modules ("sim" and "cost_models"), so there is no trivial solution.
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.cc:L1-359
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Hmm..this indicates that the code models racks as task equivalence aggregators. Maybe the new version of the code is doing this, but previously it was treating racks as resources.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> We discussed the reasoning behind this (a quick hack to allow direct mapping of tasks to racks, rather than via TECs). I'm not sure if we arrived at a conclusion what a better way of doing this would be? Needs more discussion
[x] File: src/misc/utils.cc:L20-36
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> We should add a comment here to point out why this is necessary, and where it's used.
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> +1 on this one.
[x] File: src/misc/utils.cc:L291-314
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> One declaration per line.
[x] File: src/misc/utils.cc:L291-314
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> I'm a little confused: why do we do this? There shouldn't usually be any others open, but there could be (e.g. if a perf file is written or logging is done concurrently). Is this in order to get rid of stale open FDs?
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Added XXX(ionel), but let's ask Adam.
[x] File: src/scheduling/cost_models/cost_models.h:L16-22
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> In an ideal implementation, the simulated Quincy model would be a wrapper around the real Quincy model that feeds it with simulated data locality information. We should think about re-engineering it that way.
[x] File: src/scheduling/cost_models/octopus_cost_model.cc:L184-194
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Do you know what flag controls logging here?
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.cc:L1-359
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Why are tasks modelled as equiv classes? I don't any limitation in modelling them as task nodes.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Misleading comment; reworded. The TECs in this cost model are equivalent to the rack aggregators.
[x] File: src/scheduling/cost_models/simulated_dfs.cc:L1-104
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Hang on, is this computing the replication factor as an element of (0, num_machines - 1)? That would be very different to how HDFS works, for example.
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> This is fine. The replication factor is passed in to the simulated dfs via the constructor.\ The body of the for loop just picks one replica at a time for the given file.
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.cc:L1-359
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Hmm..I think from this method we should just return the equiv_classes to which the resource node is going to be connected. Similarly, from the GetOutgoingEquivClassPrefArcs we should return the ResIDs to which the equiv_class will be connected (i.e., just the forward going arcs).
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.cc:L1-359
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> I wonder if we should randomize this by adding the machine to a random rack that has ports for another machine.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Won't fix for this PR. Have added a comment explaining what's going on here and pointing out that we should change it.
[x] File: src/scheduling/cost_models/simulated_quincy_cost_model.cc:L1-359
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> This should definitely be a TODO.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> It is now.
[x] File: src/scheduling/dimacs_change.h:L6-14
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> This file has a misleading name. It only prints a comment. It doesn't generate a change.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> It's the superclass for the concrete types of change, though, so I think this is okay?
[x] File: src/scheduling/dimacs_change_arc.cc:L9-16
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> We should definitely be able to de-activate these comments.
[x] File: src/scheduling/dimacs_change_stats.cc:L1-43
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Fix indentation.
[x] File: src/scheduling/dimacs_change_stats.cc:L1-43
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Another way to do this is to have the fields (e.g. new_node, new_arc..) in the dimacs_change.cc. Then we can increment them from any subclass. Moreover, we won't have to do any instanceOf checks.
[x] File: src/scheduling/dimacs_change_stats.h:L1-23
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Only one variable per line.
[x] File: src/scheduling/dimacs_change_stats.h:L1-23
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Remove SRC_. I'll stop commenting about these. We'll just have to do a pass over all the files.
[x] File: src/scheduling/dimacs_new_arc.cc:L9-16
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Ditto.
[x] File: src/scheduling/dimacs_remove_node.cc:L7-14
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Ditto.
[x] File: src/sim/trace-extract/event_desc.proto:L11-19
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Shouldn't have pinning in the name?
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> I think we should leave it at a more generic name, since this would also cover e.g. preemption and migration (which don't change the flow graph, but do require re-running).
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L55-164
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> I think all these settings should be in a quincy specific file.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L55-164
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Why is this a constant? Should we just maintain it and update it as we add/remove machines?
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> I think it's an approximation used in some calculations; we should certainly rename it to emphasize that it's an approximation (or use the real value).
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Appended APPROXIMATION to the name.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L260-292
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> MAX_VALUE is uint32_t. Shouldn't the hash be uint64_t?
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> The code uses Spooky::Hash32, which returns a uint32_t, so this is fine. However, looking at the Spooky source code, the uint32_t is obtained by casting a uint64_t, so we may as well move to Hash64 without drop in performance.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L311-357
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> I guess we have to make sure we use the same way of writing to files. One uses ofstream and another one uses fopen.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Agree, though maybe we should make an issue for this and leave it as-is in this PR. Alternatively, re-engineer the code to make stats_file a FILE* everywhere. (Google style guide says that streams should only be used for logging.)
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Let's ignore it for now because it's a nit.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L311-357
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> I'd rather have this as a method.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L443-511
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> A comment would help here. I assume that all the tasks are sharing the same equivalence class.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> I've added a comment, but it's not as you assume: by contrast, each task has its own, single-entity equivalence class. I think Adam did this as a short-cut hack in order to have information in the KB even though the SimulatedQuincyCostModel does not actually have a proper notion of TECs (it would record the stats on a per-rack basis!). We almost certainly need to do some work here to improve statistics collection. Okay to merge anyway with my changes?
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L633-657
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Add comment. (e.g. Only considering machines that have a hash < proportion). Maybe proportion should be called max_hash.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L764-785
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Ditto.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L1022-1064
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Rename + plus comment. From what I understand the method is used to lookup machine binary from relative path and not cwd.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Moved to misc/utils.cc, since this is more general functionality that will be useful outside the simulator.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L1114-1130
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Ditto.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L1132-1284
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> We probably want to move this into a separate method. The current method is already rather long.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> I made a more general pass over ReplayTrace and refactored it into smaller methods. It's now ~130 lines itself.
[x] File: src/sim/trace-extract/google_trace_simulator.cc:L1300-1308
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> I think we should not remove a task from the knowledge base. We want to keep the information.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> We now no longer do, although we'll have to monitor memory usage and possibly find ways of compressing the information.
[x] File: src/sim/trace-extract/google_trace_task_processor.cc:L44-60
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Fix indentation.
[x] File: src/sim/trace-extract/google_trace_task_processor.cc:L900-943
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Fix.
[x] File: src/scheduling/dimacs_exporter.cc:L110-119
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Done.
[x] File: src/scheduling/cost_models/simulated_dfs.cc:L1-106
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> This is a little suboptimal, as the ext directory isn't part of our standard include path (and not in the source tree either). Should we just add it to the include path in Makefile.config?
<img border=0 src='https://avatars.githubusercontent.com/u/433340?v=3' height=16 width=16'> Let's leave it as it is for now. It's not major:)
[x] File: src/scheduling/quincy_dispatcher.h:L64-72
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> Check why this is required.
<img border=0 src='https://avatars.githubusercontent.com/u/192315?v=3' height=16 width=16'> It isn't, so I removed it.

camsas / firmament

Integrate Adam's changes (inc. simulator and simulated Quincy cost model) #17