Agent (File collection) needs the ability to filter and collect valid data content.
Agent (Pulsar collection) requires PB protocol parsing and data extraction capabilities.
Solution
Transform is integrated as an SDK by Agent; Manager will also integrate Transform to provide pre-transformation validation when users configure transformation SQL.
Before performing transformation processing, the Agent needs to register the transformation configuration pulled from Manager to Transform. When the transformation configuration changes, it needs to re-register the configuration to Transform based on Key: StreamSourceId.
Agent-Sink passes StreamSourceId and RawData into Transform, and Transform returns zero or more FormalData. Agent-Sink sends the final FormalData to DataProxy.
For Transform's registered configurations, there is one set of configurations per StreamSourceId, and one StreamSourceId belongs to one GroupId and StreamId's InLong data stream.
Transform's transformation configuration includes three parts: transformation Source, transformation SQL, and transformation Sink.
Transformation SQL first provides basic field filtering and field cropping. Other date and time conversion functions and string conversion functions will be supplemented later based on Flink's built-in functions.
Motivation
Solution
Configuration Model
Interface API of Transform SDK
Task list
Use case
No response
Are you willing to submit PR?
Code of Conduct