Meeting Dec 14 on 9.30 AM Wednesday.

Analyze Yahoo streaming benchmarks (from their blog probably)

Goal: Measure the latency of the frameworks under low throughput scenarios Solution: Flink and Storm are good (law latency) and good throughput Spark high latency(bad) and acceptable throughput Flink did not use checkpoint to guarantee processing

Read the blog post of Data Artizans about yahoo benchmarks (http://data-artisans.com/extending-the-yahoo-streaming-benchmark/)

Goal: measure the maximum throughput of each system while maintaining the best possible fault tolerance. Goal 2: Do some optimizations and variants such as not use key store REDIS

What are the differences?

With the new approach

Flink is more efficient because can manage more throughput

Fault tolerant and consistency in Flink and Storm

What are the bottlenecks in yahoo streaming benchmarks?

Redis

Storing Key Value Store, while updating the windows very quickly , crash in 280,000 events/sec

Depending on bottlenecks, state the clear definition of the problem.

Delete Key Value Store, beacuase the is part of the fault tolerant local state (with the checkpoints??), With thise approach pass from 280,000 events/sec to 15,000,000 events/sec

andresvivancov / BDAPRO-WindowsBenchmarking

Meeting Dec 14 on 9.30 AM Wednesday. #15