apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.05k stars 1.83k forks source link

[Feature][LocalFile Source] Can you develop a real-time collection of file components, functions similar to Flume's Taildir Source or Spooling Source #8080

Open aiyi926 opened 3 days ago

aiyi926 commented 3 days ago

Search before asking

Description

Can you develop a real-time collection of file components, functions similar to Flume's Taildir Source or Spooling Source 可以开发一个实时采集文件的组件吗,功能类似Flume的TailSource或者是Spooling Source

Usage scenario: 1, the log file is continuously written, the data in the log file needs to be collected in real time and sent to Kafka. Currently, Flume TailSource is supported and recoverable transmission is supported. 日志文件是连续写入的,日志文件中的数据需要实时采集并发送给Kafka。目前支持Flume TailSource,支持可恢复传输。

  1. Collect new files in the monitoring directory and upload them to the hdfs. Flume Spooling Source is supported 监控目录下的新文件,有新文件出现需要采集新文件上传到hdfs中,目前支持Flume Spooling Source

These two functions are very much needed 非常的需要这两个功能

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

zhdech commented 2 days ago

I feel that ST can directly connect Flume and Kafka, so there is no need to implement it again. If it is necessary to implement it on ST, ST can be opened twice 我感觉 ST 直接对接flume和kafka就可以,ST也就没必要再实现一次了。如果非要在ST上实现,可以对ST二开

aiyi926 commented 1 day ago

感觉 ST 可以直接连接 Flume 和 Kafka,所以没有必要再次实现。如果非要在 ST 上实现的话, ST 可以开两次 我感觉 ST 直接对接flume和kafka就可以,ST也就没必要再实现一次了。如果非要在ST上实现,可以对ST二开

I feel that ST can directly connect Flume and Kafka, so there is no need to implement it again. If it is necessary to implement it on ST, ST can be opened twice 我感觉 ST 直接对接flume和kafka就可以,ST也就没必要再实现一次了。如果非要在ST上实现,可以对ST二开

ST没有办法对接flume,可以对接kafka,想直接用ST监控本地文件内容上传