apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.07k stars 1.83k forks source link

[Feature][Transform] Source lifecycle acquisition #8004

Open Xuzhengz opened 2 weeks ago

Xuzhengz commented 2 weeks ago

Search before asking

Description

How does Transform know if the source has been executed? Because I want to use the newly added FlatMap multi line output function of Transform to achieve aggregation effect, that is, after the source is completed, the result of Transform and aggregation is output line by line to the destination

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

Xuzhengz commented 2 weeks ago

help!!!

Xuzhengz commented 2 weeks ago

come here

liunaijie commented 2 weeks ago

I am not get your question. The lifecycle is source -> transform -> sink

Only source can generate data, transform only receive the data from source.

How does Transform know if the source has been executed?

When transfrom recived data, it means sourcce has been executed.

liunaijie commented 2 weeks ago

Do you want to ask how to judge source job total complete, has read all data?

CosmosNi commented 2 weeks ago

it seem can't solve it with flatmap. It seems to be multi-line data processing. @corgy-w cc

Xuzhengz commented 2 weeks ago

I am not get your question. The lifecycle is source -> transform -> sink

Only source can generate data, transform only receive the data from source.

How does Transform know if the source has been executed?

When transfrom recived data, it means sourcce has been executed. My requirement is to customize a tansform component for grouping aggregation. Aggregate operations will be carried out after each row of data comes in, and the aggregate results will be output to sink after the source is completed

Xuzhengz commented 2 weeks ago

Recently I saw that tansform supports multiple lines of output, so I'm trying to see if I can implement an aggregate transform

Xuzhengz commented 2 weeks ago

Is there a better solution to implement aggregation operations on seatunnel, even if the parallelism is set to 1,Because I am currently facing many requirements that require aggregation operations to be implemented on the portal instead of the database side @liunaijie cc

CosmosNi commented 2 weeks ago

@corgy-w is implementing multi-line transform and aggregation , You can communicate the progress

liunaijie commented 2 weeks ago

Is there a better solution to implement aggregation operations on seatunnel, even if the parallelism is set to 1,Because I am currently facing many requirements that require aggregation operations to be implemented on the portal instead of the database side @liunaijie cc

No, SeaTunnel is focus on Data integration, the flatMap function recently support is use to explode data from one row to multiple rows.

From your description, you want the aggregation function, it's compute function, I think we are not going to support it. cc @Hisoka-X

Hisoka-X commented 2 weeks ago

Is there a better solution to implement aggregation operations on seatunnel, even if the parallelism is set to 1,Because I am currently facing many requirements that require aggregation operations to be implemented on the portal instead of the database side @liunaijie cc

No, SeaTunnel is focus on Data integration, the flatMap function recently support is use to explode data from one row to multiple rows.

From your description, you want the aggregation function, it's compute function, I think we are not going to support it. cc @Hisoka-X

Yes, there is no way to solve data sharing between multiple degrees of parallelism, possibly on different nodes.