TheFeloDevTeam / FeloFamilySite

https://thefelodevteam.github.io/FeloFamilySite/
0 stars 1 forks source link

Qu'est ce que le stream processing ? #59

Open christianfelicite opened 4 years ago

christianfelicite commented 4 years ago

https://www.upsolver.com/blog/batch-stream-a-cheat-sheet

Originally posted by @christianfelicite in https://github.com/TheFeloDevTeam/FeloFamilySite/issues/57#issuecomment-655962286

What is stream processing?

In stream processing, we process data as soon as it arrives in the storage layer – which would often also be very close to the time it was generated (although this would not always be the case). This would typically be in sub-second timeframes, so that for the end user the processing happens in real-time. These operations would typically not be stateful, or would only be able to store a ‘small’ state, so would usually involve a relatively simple transformation or calculation.

Screen-Shot-2020-05-25-at-17.05.22.png

christianfelicite commented 4 years ago

60

christianfelicite commented 4 years ago

When to use stream processing

While stream processing and real-time processing are not necessarily synonymous, we would use stream processing when we need to analyze or serve data as close as possible to when we get hold of it.

Examples of scenarios where data freshness is super-important could include real-time advertising, online inference in machine learning, or fraud detection. In these cases we have data-driven systems that need to make a split-second decision: which ad to serve? Do we approve this transaction? We would use stream processing to quickly access the data, perform our calculations and reach a result.

Indications that stream processing is the right approach:

christianfelicite commented 4 years ago

Stream processing tools and frameworks

Stream processing and micro-batch processing are often used synonymously, and frameworks such as Spark Streaming would actually process data in micro-batches.

However, there are some pure-play stream processing tools such as Confluent’s KSQL, which processes data directly in a Kafka stream, as well as Apache Flink and Apache Flume.