addthis / hydra

Apache License 2.0
435 stars 86 forks source link

Comparison with popular streaming engine as Flink #284

Closed yingfeng closed 5 years ago

yingfeng commented 6 years ago

Hi,

I've noticed that the hydra project has appeared for several years and is still under active developing. Currently, similar tasks as unified batch and stream processing could also be delivered using some popular streaming engine as Flink, for example, Uber uses AthenaX to facilitate developers with SQL on Flink to process massive data from both batch and stream data. How does hydra compared with such platform? Thanks~

kxingit commented 6 years ago

hi, there is a hydra doc that may help you decide in case you don't know it already http://oss-docs.addthiscode.net/hydra/latest/user-guide/index.html

my personal opinions are: comparing hydra to Uber's AthenaX+Flink, the basic functionality is data aggregation, and empower users to manage jobs without coding applications.

pros for AthenaX+Flink

  1. more rich sql-like queries via conversion by query planner. hydra's query grammar can be found in the doc
  2. rich task level alerts (hydra only alerts on task status but not task results)
  3. more options in terms of input/output format. it supports kafka/cassandra/memsql/mysql/elasticsearch while hydra supports mainly kafka/output of anther job/file/S3 (WIP)
  4. real map-reduce model, while hydra's reduce machine is a single host
  5. more real time? hydra jobs can be set to run in intervals, but not in a "real" real-time fashion. but i am not 100% sure of how Uber's stack works in details

pros for hydra

  1. no external dependencies (except rabbitmq and zookeeper), meaning you don't have to manage Calcite Flink Yarn, levelDB... before you can actually do things
  2. sort of the same as 1, but it is worth emphasizing that hydra is a all-in-one system, including data storage, replication, processing, query, job management, web UI...