dos-group / iterations-experiments

experiments with recurring parallel dataflows
Apache License 2.0
0 stars 0 forks source link

Experiments: evaluate using Tachyon to speed-up driver-side iterations #3

Open lauritzthamsen opened 9 years ago

lauritzthamsen commented 9 years ago

Evaluate running iterative Flink jobs on Tachyon (http://tachyon-project.org/, http://tachyon-project.org/Running-Flink-on-Tachyon.html), i.e. k-means and especially a version that doesn't use Flink's native iterations (https://github.com/citlab/adaptive-iterations/issues/2).

Steps

  1. Is this working locally? How much more time does it take to run driver-based iterations using Tachyon compared to native iterations locally?
  2. is it working on wally? how much more time distributed on a few wally nodes?
  3. configuration files for wally: Tachyon, HDFS & Flink
lauritzthamsen commented 9 years ago

Tachyon should probably be backed by HDFS as described in http://tachyon-project.org/Fault-Tolerant-Tachyon-Cluster.html.

lauritzthamsen commented 9 years ago

It might also make sense to evaluate whether Tachyon is actually faster than just HDFS. So, basically a comparison between

aalexandrov commented 9 years ago

I would be willing to guide a student who wants to create a Peel bundle for that experiment.

lauritzthamsen commented 9 years ago

yes, i want to start using peel for my experiments anyway (#5) and think it makes sense to do the experiments described above with peel then as well.

however, lets first see if Flink and Tachyon actually play well together and if it's as easy as described in http://tachyon-project.org/Running-Flink-on-Tachyon.html

lauritzthamsen commented 9 years ago

@Rubenito Peel is a framework that allows to create bundles of experiments with their data and configuration. Peel can then execute these experiments automatically:

see peel-framework.org and https://github.com/stratosphere/peel. @aalexandrov developed / is developing peel.