Sunt-ing / database-system-readings

:yum: A curated reading list about database systems
MIT License
466 stars 31 forks source link

Stream Processing with Apache Flink #112

Open Sunt-ing opened 2 years ago

Sunt-ing commented 2 years ago

Chapter 1

Event-Driven Applications

Table API is in LINQ style.

Chapter 2

Modern stream processors, like Apache Flink, can offer latencies as low as a few milliseconds.

Processing-time windows introduce the lowest latency possible. Since you do not take into consideration late events and out-of-order events, a window simply needs to buffer up events and immediately trigger computation once the specified time length is reached. Thus, for applications where speed is more important than accuracy, processing time comes in handy. Another case is when you need to periodically report results in real time, independently of their accuracy. An example application would be a real-time monitoring dashboard that displays event aggregates as they are received. Finally, processing-time windows offer a faithful representation of the streams themselves, which might be a desirable property for some use cases. For instance, you might be interested in observing the stream and counting the number of events per second to detect outages. To recap, processing time offers low latency but results depend on the speed of processing and are not deterministic.

It is important to note that sometimes you can get stronger semantics with weaker guarantees. A common case is when a task performs idempotent operations, like maximum or minimum. In this case, you can achieve exactly-once semantics with at-least-once guarantees.

Chapter 3. The Architecture of Apache Flink

Flink does not provide durable, distributed storage. Instead, it takes advantage of distributed filesystems like HDFS or object stores such as S3. For leader election in highly available setups, Flink depends on Apache ZooKeeper.

Flink applications can be deployed in two different styles.

图片

Data Transfer in Flink The TaskManagers take care of shipping data from sending tasks to receiving tasks. The network component of a TaskManager collects records in buffers before they are shipped, i.e., records are not shipped one by one but batched into buffers. This technique is fundamental to effectively using the networking resource and achieving high throughput. The mechanism is similar to the buffering techniques used in networking or disk I/O protocols. Note that shipping records in buffers does imply that Flink’s processing model is based on microbatches. Each TaskManager has a pool of network buffers (by default 32 KB in size) to send and receive data. If the sender and receiver tasks run in separate TaskManager processes, they communicate via the network stack of the operating system. Streaming applications need to exchange data in a pipelined fashion—each pair of TaskManagers maintains a permanent TCP connection to exchange data.2 With a shuffle connection pattern, each sender task needs to be able to send data to each receiving task. A TaskManager needs one dedicated network buffer for each receiving task that any of its tasks need to send data to.

Data transfer between TaskManagers:

图片

With a shuffle or broadcast connection, each sending task needs a buffer for each receiving task; the number of required buffers is quadratic to the number of tasks of the involved operators. Flink’s default configuration for network buffers is sufficient for small- to medium-sized setups. For larger setups, you need to tune the configuration.

When a sender task and a receiver task run in the same TaskManager process, the sender task serializes the outgoing records into a byte buffer and puts the buffer into a queue once it is filled. The receiving task takes the buffer from the queue and deserializes the incoming records. Hence, data transfer between tasks that run on the same TaskManager does not cause network communication.

Credit-Based Flow Control

Task Chaining: The functions of the operators are fused into a single task that is executed by a single thread. Records that are produced by a function are separately handed over to the next function with a simple method call. Hence, there are basically no serialization and communication costs for passing records between functions.

Watermarks:

If a source function (temporarily) does not emit anymore watermarks, it can declare itself idle. Flink will exclude stream partitions produced by idle source functions from the watermark computation of subsequent operators. The idle mechanism of sources can be used to address the problem of not advancing watermarks as discussed earlier.