alibaba / DataX

DataX是阿里云DataWorks数据集成的开源版本。
Other
15.96k stars 5.44k forks source link

mysql2elasticsearch 不管怎么配置setting,job都只会被切分为1个task,求解答 #1568

Open lx18379612615 opened 2 years ago

lx18379612615 commented 2 years ago

我设置了channel,也配置了splitPK(bigint类型的主键),都只会有1个task


{
        "content":[
                {
                        "reader":{
                                "name":"mysqlreader",
                                "parameter":{
                                        "column":[
                                                "id",
                                                "label1",
                                                "label2"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:mysql://10.20.*.*:33061/db_table?useUnicode=true&characterEncoding=utf-8"
                                                        ],
                                                        "table":[
                                                                "mysql2es"
                                                        ]
                                                }
                                        ],
                                        "password":"***********",
                                        "splitPK":"id",
                                        "username":"root"
                                }
                        },
                        "writer":{
                                "name":"elasticsearchwriter",
                                "parameter":{
                                        "accessId":"elastic",
                                        "accessKey":"***********",
                                        "batchSize":1000,
                                        "cleanup":true,
                                        "column":[
                                                {
                                                        "name":"id",
                                                        "type":"long"
                                                },
                                                {
                                                        "name":"label1",
                                                        "type":"integer"
                                                },
                                                {
                                                        "name":"label2",
                                                        "type":"integer"
                                                }
                                        ],
                                        "discovery":false,
                                        "dynamic":false,
                                        "endpoint":"http://10.20.*.*:9400",
                                        "index":"mysql2es",
                                        "settings":{
                                                "index":{
                                                        "number_of_replicas":1,
                                                        "number_of_shards":3,
                                                        "refresh_interval":"10s"
                                                }
                                        },
                                        "type":"_doc"
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":3
                }
        }
}

2022-10-26 10:43:01.565 [job-0] INFO  ElasticSearchWriter$Job - unified version: 1666752181565
2022-10-26 10:43:01.570 [job-0] INFO  ElasticSearchWriter$Job - [{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"id","origin":false,"type":"long"},{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"label1","origin":false,"type":"integer"},{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"label2","origin":false,"type":"integer"}]
2022-10-26 10:43:01.570 [job-0] INFO  ElasticSearchWriter$Job - index:[mysql2es], type:[_doc], mappings:[{"properties":{"id":{"type":"long"},"label1":{"type":"integer"},"label2":{"type":"integer"}}}]
2022-10-26 10:43:01.577 [job-0] INFO  ElasticSearchClient - begin GetMapping for index: mysql2es
2022-10-26 10:43:01.581 [job-0] INFO  ElasticSearchWriter$Job - the mappings for old index is: {"id":{"type":"long"},"label1":{"type":"integer"},"label2":{"type":"integer"}}
2022-10-26 10:43:01.582 [job-0] INFO  ElasticSearchClient - begin GetSettings for index: mysql2es
2022-10-26 10:43:01.586 [job-0] INFO  ElasticSearchWriter$Job - merge1 settings:{"mysql2es":{"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"refresh_interval":"10s","number_of_shards":"3","provided_name":"mysql2es","creation_date":"1666751769822","number_of_replicas":"1","uuid":"LjMiDqIlR0-vUKlnCNNZdA","version":{"created":"7120099"}}}}}, settingsCache:null, includeSettings:{"number_of_replicas":"1","number_of_shards":"3"}
2022-10-26 10:43:01.587 [job-0] INFO  ElasticSearchClient - delete index mysql2es
2022-10-26 10:43:01.891 [job-0] INFO  ElasticSearchClient - delete index mysql2es success
2022-10-26 10:43:01.891 [job-0] INFO  ElasticSearchWriter$Job - merge2 settings:{"index":{"number_of_replicas":1,"number_of_shards":3,"refresh_interval":"10s"}}, settingsCache:{"index":{"number_of_replicas":1,"number_of_shards":3,"refresh_interval":"10s"},"number_of_replicas":"1","number_of_shards":"3"}
2022-10-26 10:43:01.894 [job-0] WARN  ElasticSearchClient - null
2022-10-26 10:43:01.894 [job-0] WARN  ElasticSearchClient - IndicesExists got ResponseCode: 404 ErrorMessage: 404 Not Found
2022-10-26 10:43:01.894 [job-0] INFO  ElasticSearchClient - create index mysql2es
2022-10-26 10:43:02.060 [job-0] INFO  ElasticSearchClient - create mysql2es index success
2022-10-26 10:43:02.061 [job-0] INFO  ElasticSearchClient - create mappings for mysql2es  {"properties":{"id":{"type":"long"},"label1":{"type":"integer"},"label2":{"type":"integer"}}}
2022-10-26 10:43:02.110 [job-0] INFO  ElasticSearchClient - index mysql2es put mappings success
2022-10-26 10:43:02.111 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2022-10-26 10:43:02.111 [job-0] INFO  JobContainer - Job set Channel-Number to 3 channels.
2022-10-26 10:43:02.115 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2022-10-26 10:43:02.116 [job-0] INFO  JobContainer - DataX Writer.Job [elasticsearchwriter] splits to [1] tasks.
2022-10-26 10:43:02.125 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2022-10-26 10:43:02.128 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2022-10-26 10:43:02.130 [job-0] INFO  JobContainer - Running by standalone Mode.
2022-10-26 10:43:02.135 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2022-10-26 10:43:02.138 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-10-26 10:43:02.138 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2022-10-26 10:43:02.150 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2022-10-26 10:43:02.152 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select id,label1,label2 from mysql2es 
] jdbcUrl:[jdbc:mysql://10.20.35.85:33061/sx_dmp_jres?useUnicode=true&characterEncoding=utf-8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2022-10-26 10:43:02.155 [0-0-0-writer] INFO  ElasticSearchWriter$Job - columnList: [{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"id","origin":false,"type":"long"},{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"label1","origin":false,"type":"integer"},{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"label2","origin":false,"type":"integer"}]
2022-10-26 10:43:02.155 [0-0-0-writer] INFO  ElasticSearchWriter$Job - Task will use elasticsearch auto generated _id property
2022-10-26 10:43:02.156 [0-0-0-writer] INFO  AbstractJestClient - Setting server pool to a list of 1 servers: [http://10.20.32.117:9400]
2022-10-26 10:43:02.157 [0-0-0-writer] INFO  JestClientFactory - Using multi thread/connection supporting pooling connection manager
2022-10-26 10:43:02.158 [0-0-0-writer] INFO  JestClientFactory - Using default GSON instance
2022-10-26 10:43:02.158 [0-0-0-writer] INFO  JestClientFactory - Node Discovery disabled...
2022-10-26 10:43:02.158 [0-0-0-writer] INFO  JestClientFactory - Idle connection reaping disabled...
Y-evil commented 1 year ago

"splitPK":"id", 你单词拼错了,K小写啊,大哥.

hc-leo commented 1 year ago

"splitPk"