apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.86k stars 1.77k forks source link

[Feature][connectors-v2][connector-file] Add support for reading and writing compressed files, such as tar.gz, zip, etc #7587

Open wuchunfu opened 3 weeks ago

wuchunfu commented 3 weeks ago

Search before asking

Description

Now it seems that our file connectors do not support reading compressed files, such as tar.gz, zip, and other types of files, and adding common types of compressed files for reading and writing

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

corgy-w commented 3 weeks ago

Can be assigned to me temporarily

corgy-w commented 3 weeks ago

@wuchunfu #6826 I think binary can actually represent compressed files. should we distinguish them?

wuchunfu commented 3 weeks ago

@wuchunfu #6826 I think binary can actually represent compressed files. should we distinguish them?

@corgy-w Binary seems impossible, so how to store a compressed file instead of just transferring it

corgy-w commented 3 weeks ago

Now there are two ways to achieve, hope express their views

  1. in the case of not adding new types to support zip, etc., for example: there is text.zip is essentially a text file needs to be imported, parse the zip file can be
  2. add a new type of suffix, reuse the processing logic of various formats, scalability is strong

Personally prefer the first option

corgy-w commented 2 weeks ago

Now there are two ways to achieve, hope express their views

  1. in the case of not adding new types to support zip, etc., for example: there is text.zip is essentially a text file needs to be imported, parse the zip file can be
  2. add a new type of suffix, reuse the processing logic of various formats, scalability is strong

Personally prefer the first option

Choose 1