camsas / firmament

The Firmament cluster scheduling platform
Apache License 2.0
415 stars 79 forks source link

Improve hash functions used to generate IDs. #37

Closed ICGog closed 8 years ago

ICGog commented 8 years ago

Currently, we use hash_combine() from boost::hash to generate unique IDs. However, boost::hash::hash_combine is a rather weak hash that in practice can lead to collisions. One such collision can be obtained by running a simulation with 1000 tasks / second submission rate using the following command:

./build/sim/simulator --scheduler="flow" --simulation="synthetic"  --solver="flowlessly" --online_factor=1 --logtostderr --run_incremental_scheduler=true --generate_trace --generated_trace_path=/home/icg27/firmament/generated_trace/ --flow_scheduling_cost_model=6 --synthetic_num_machines=1500 --synthetic_jobs_per_second=20 --synthetic_tasks_per_job=50 --trace_path="/" --runtime=1000000000
ms705 commented 8 years ago

We should fix this, since we've already hit the issue several times before.

In the past, @AdamGleave moved (parts of) the simulator to use SpookyHash, and we already have SpookyHash as an external dependency in the code base. This suggests that we should use SpookyHash as a replacement for boost::hash::hash_combine().

Am I right in assuming that these are changes that affect both simulation and real execution, since common components (e.g., Task IDs) are affected?

ICGog commented 8 years ago

Yes, you're right. The changes affect both simulation and real execution.

We're now using SpookyHash wherever we don't need to merge two sources of information. The change is here: https://github.com/ms705/firmament/commit/af072c050939062176bb3735aae6a6cc7fb9ce52