apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.59k stars 1.68k forks source link

[Bug] [FTP Connector] The reading and writing of FTP are very slow #7048

Open Xuzhengz opened 2 weeks ago

Xuzhengz commented 2 weeks ago

Search before asking

What happened

The read and write of FTP is very slow. I have tried to connect to FTP and it took a few seconds to complete. I have ruled out the reason for the slow connection. When reading, it takes a while to create a task, and then assigning the read FTP task to subtasks is also slow. When writing, the release classloader keeps releasing, and only one piece of data is written out, but the task takes a few minutes to complete.

SeaTunnel Version

dev-2.3.6

SeaTunnel Config

{
    "env": {
        "job.name": "Xml文件输出",
        "job.mode": "batch"
    },
    "preHandler": [

    ],
    "source": [
        {
            "plugin_name": "Jdbc",
            "driver": "com.mysql.cj.jdbc.Driver",
            "connection_check_timeout_sec": 100,
            "table_list": [
                {
                    "table_path": "test_data.device",
                    "query": "SELECT\n `device_id`,\n `name`,\n `type`,\n `longitude`,\n `latitude`,\n `height`,\n `radius`,\n `distance`,\n `service_address`,\n `status`,\n `term_type`,\n `properties`,\n `runway_name`,\n `direction`,\n `runway_code`,\n `delay`\nFROM\n `device`"
                }
            ],
            "database": "test_data",
            "url": "jdbc:mysql://******:3306/test_data?remarks=true&useInformationSchema=true&useCursorFetch=true&defaultFetchSize=2048&rewriteBatchedStatements=true",
            "user": "******",
            "password": "******",
            "result_table_name": "ot_b7ba264ac3a84eb4b4d1b3bb93373a20"
        }
    ],
    "transform": [

    ],
    "sink": [
        {
            "file_format_type": "xml",
            "custom_filename": true,
            "file_name_expression": "xml_test",
            "is_enable_transaction": false,
            "xml_root_tag": "RECORDS",
            "xml_row_tag": "RECORD",
            "xml_use_attr_format": false,
            "batch_size": 1000000000,
            "plugin_name": "FtpFile",
            "host": "******",
            "port": "******",
            "user": "******",
            "password": "******",
            "tmp_path": "/ottomi/tmp/ottomi",
            "path": "/ottomi/file-node/download/1793861143369256962/xml/",
            "result_table_name": "ot_16aad011b9314e15977921dac312ca5f",
            "source_table_name": [
                "ot_b7ba264ac3a84eb4b4d1b3bb93373a20"
            ]
        }
    ]
}

Running Command

bin/seatunnel.sh -c ftp.json

Error Exception

A small amount of data, but the task took a few minutes to complete, or even a long time without any response, and the client disconnected

java.lang.RuntimeException: org.apache.hadoop.fs.ftp.FTPException: Client not connected
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:262)
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:68)
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70)
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
at

Zeta or Flink or Spark Version

No response

Java or Scala Version

1.8

Screenshots

image

Are you willing to submit PR?

Code of Conduct

Xuzhengz commented 2 weeks ago

Compared to other file read and write plugins such as S3 and local, they are both fast, but FTP is particularly slow