DTStack / chunjun

A data integration framework
https://dtstack.github.io/chunjun/
Apache License 2.0
3.99k stars 1.69k forks source link

flinkx1.12 mysql同步hive 无法断点续传(yarn-perjob模式) #1139

Open biandou1313 opened 2 years ago

biandou1313 commented 2 years ago

Search before asking

What happened

1660201162(1) { "job": { "setting": { "errorLimit": {}, "speed": { "channel": 1, "bytes": 0 }, "log": { "isLogger": false } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": [ { "name": "id", "type": "INT", "precision": 10, "columnDisplaySize": 10 }, { "name": "address", "type": "VARCHAR"

                        }
                    ],
                    "username": "root",
                    "password": "123456",
                    "increColumn":"id",

                    "connection": [
                        {
                            "table": [
                                "mysqlreader2"
                            ],
                            "jdbcUrl": [
                                "jdbc:mysql://172.18.8.77:3306/zk_test"
                            ]
                        }
                    ],
                    "dataSourceId": 21
                }
            },
            "writer": {
                "name": "hivewriter",
                "parameter": {
                    "jdbcUrl": "jdbc:hive2://172.18.8.208:10000/Vasyslink_yag001",
                    "fileType": "text",
                    "fieldDelimiter": "\t",
                    "writeMode": "append",
                    "charsetName": "UTF-8",
                    "maxFileSize": 1073741824,
                    "tablesColumn": "{\"dept22\":[{\"key\":\"deptno\",\"type\":\"int\",\"precision\":10,\"columnDisplaySize\":11},{\"key\":\"address\",\"type\":\"string\"}]}",
                    "defaultFS": "hdfs://172.18.8.207:8020",
                    "dataSourceId": 15,
                    "partition": "pt",
                    "partitionType": "USERDEFINED",
                    "partitionValue": "2022041000"
                }
            }
        }
    ]
}

}

What you expected to happen

下次执行从上次记录位置开始执行

How to reproduce

执行json

{ "job": { "setting": { "errorLimit": {}, "speed": { "channel": 1, "bytes": 0 }, "log": { "isLogger": false } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": [ { "name": "id", "type": "INT", "precision": 10, "columnDisplaySize": 10 }, { "name": "address", "type": "VARCHAR"

                        }
                    ],
                    "username": "root",
                    "password": "123456",
                    "increColumn":"id",

                    "connection": [
                        {
                            "table": [
                                "mysqlreader2"
                            ],
                            "jdbcUrl": [
                                "jdbc:mysql://172.18.8.77:3306/zk_test"
                            ]
                        }
                    ],
                    "dataSourceId": 21
                }
            },
            "writer": {
                "name": "hivewriter",
                "parameter": {
                    "jdbcUrl": "jdbc:hive2://172.18.8.208:10000/Vasyslink_yag001",
                    "fileType": "text",
                    "fieldDelimiter": "\t",
                    "writeMode": "append",
                    "charsetName": "UTF-8",
                    "maxFileSize": 1073741824,
                    "tablesColumn": "{\"dept22\":[{\"key\":\"deptno\",\"type\":\"int\",\"precision\":10,\"columnDisplaySize\":11},{\"key\":\"address\",\"type\":\"string\"}]}",
                    "defaultFS": "hdfs://172.18.8.207:8020",
                    "dataSourceId": 15,
                    "partition": "pt",
                    "partitionType": "USERDEFINED",
                    "partitionValue": "2022041000"
                }
            }
        }
    ]
}

}

Anything else

No response

Version

1.12_release

Are you willing to submit PR?

Code of Conduct

Paddy0523 commented 2 years ago

断点续传是指作业失败的时候,从指定的checkpoint进行恢复。相关的知识点可以查阅文档:https://dtstack.github.io/chunjun/documents/f29c0d86-f41a-5de1-a705-6dc2b6df91fb

从你的描述中无法判断你是想要断点续传还是想要做增量同步 增量同步文档:https://dtstack.github.io/chunjun/documents/d1b20bf7-fab2-5a56-8f4d-6fb13ce9fec0

biandou1313 commented 2 years ago

说错了  增量同步

------------------ 原始邮件 ------------------ 发件人: "Paddy @.>; 发送时间: 2022年8月11日(星期四) 下午4:03 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [DTStack/chunjun] flinkx1.12 mysql同步hive 无法断点续传(yarn-perjob模式) (Issue #1139)

断点续传是指作业失败的时候,从指定的checkpoint进行恢复。相关的知识点可以查阅文档:https://dtstack.github.io/chunjun/documents/f29c0d86-f41a-5de1-a705-6dc2b6df91fb

从你的描述中无法判断你是想要断点续传还是想要做增量同步 增量同步文档:https://dtstack.github.io/chunjun/documents/d1b20bf7-fab2-5a56-8f4d-6fb13ce9fec0

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Paddy0523 commented 2 years ago

增量同步具体用法可以看一下文档,不依赖taier的话需要手动填写一下startlocation

biandou1313 commented 2 years ago

我想问一下 大佬  mysql 同步hive 如何提升性能 同样服务器和数据库 一张表9000万+ sqoop 只需要10分钟  但是flinkx 1.12需要3个半小时   (都是mysql—hive)

------------------ 原始邮件 ------------------ 发件人: "Paddy @.>; 发送时间: 2022年8月11日(星期四) 晚上7:10 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [DTStack/chunjun] flinkx1.12 mysql同步hive 无法断点续传(yarn-perjob模式) (Issue #1139)

增量同步具体用法可以看一下文档,不依赖taier的话需要手动填写一下startlocation

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Paddy0523 commented 2 years ago

建议用最新的版本跑一下 新版本对性能有过一次优化

biandou1313 commented 2 years ago

1.15-bete版本么

------------------ 原始邮件 ------------------ 发件人: "Paddy @.>; 发送时间: 2022年8月12日(星期五) 上午10:38 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [DTStack/chunjun] flinkx1.12 mysql同步hive 无法断点续传(yarn-perjob模式) (Issue #1139)

建议用最新的版本跑一下 新版本对性能有过一次优化

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Paddy0523 commented 2 years ago

master就好了