Open christianfelicite opened 4 years ago
While stream processing and real-time processing are not necessarily synonymous, we would use stream processing when we need to analyze or serve data as close as possible to when we get hold of it.
Examples of scenarios where data freshness is super-important could include real-time advertising, online inference in machine learning, or fraud detection. In these cases we have data-driven systems that need to make a split-second decision: which ad to serve? Do we approve this transaction? We would use stream processing to quickly access the data, perform our calculations and reach a result.
Indications that stream processing is the right approach:
Stream processing and micro-batch processing are often used synonymously, and frameworks such as Spark Streaming would actually process data in micro-batches.
However, there are some pure-play stream processing tools such as Confluent’s KSQL, which processes data directly in a Kafka stream, as well as Apache Flink and Apache Flume.
https://www.upsolver.com/blog/batch-stream-a-cheat-sheet
Originally posted by @christianfelicite in https://github.com/TheFeloDevTeam/FeloFamilySite/issues/57#issuecomment-655962286
What is stream processing?
In stream processing, we process data as soon as it arrives in the storage layer – which would often also be very close to the time it was generated (although this would not always be the case). This would typically be in sub-second timeframes, so that for the end user the processing happens in real-time. These operations would typically not be stateful, or would only be able to store a ‘small’ state, so would usually involve a relatively simple transformation or calculation.