This epic covers a wide variety of related topics around data APIs to make it easier to ingest data into the DH system.
At the lowest level, DH provides StreamPublisher / StreamConsumer constructions. They are both chunk-based APIs. The two most relevant stream consumers are StreamToBlinkTableAdapter (to turn a stream into a Table) and a DHE construction (to ingest a stream into the DIS).
The stream publishers encapsulate the logic specific to the data source, which often includes details about the stream and the underlying data deserialization. For example, (Kafka) with (Avro|Protobuf|JSON). Right now, Kafka is tightly coupled with the deserialization logic, and part of the goal of this epic is to improve that situation.
ObjectProcessor is one of these data ingestion APIs to help improve the situation (#4346). ObjectProcessor is a 1-to-1 API (1 record in, 1 row out) that decouples stream-specific logic from deserialization-logic.
Similar APIs are being built right now that generalize ObjectProcessor further:
1-to-(0|1) (filtering: 1 record in, 0 or 1 rows out)
1-to-n (multi-row: 1 record in, N rows out)
type discrimination: record goes to specific stream consumer based on type
routing (generalization of type discrimination): record goes to multiple stream consumers (based on configuration)
This epic covers a wide variety of related topics around data APIs to make it easier to ingest data into the DH system.
At the lowest level, DH provides
StreamPublisher
/StreamConsumer
constructions. They are both chunk-based APIs. The two most relevant stream consumers areStreamToBlinkTableAdapter
(to turn a stream into aTable
) and a DHE construction (to ingest a stream into the DIS).The stream publishers encapsulate the logic specific to the data source, which often includes details about the stream and the underlying data deserialization. For example, (Kafka) with (Avro|Protobuf|JSON). Right now, Kafka is tightly coupled with the deserialization logic, and part of the goal of this epic is to improve that situation.
ObjectProcessor
is one of these data ingestion APIs to help improve the situation (#4346).ObjectProcessor
is a 1-to-1 API (1 record in, 1 row out) that decouples stream-specific logic from deserialization-logic.Similar APIs are being built right now that generalize
ObjectProcessor
further: