Open sunzhangbin opened 3 years ago
+1,我这边也遇到了这个问题,我有离线在线两个集群,总是数据出现不一致情况,重跑rutineload就好了
+1,我这边也遇到了这个问题,我有离线在线两个集群,总是数据出现不一致情况,重跑rutineload就好了
我启动两个routine load同时同步数据目前还能保证不丢数,但这不是解决根本的方法
@sunzhangbin 开两个有可能会导致时序问题吧
是否和rdKafka中的enable.auto.commit配置有关,我看这个默认是true的
同时4个routine load job往同一张表同步数据偶尔会丢数据,改成1个routine load job往一个表同步数据没有再发现丢数据的情况?具体原因查不到,日志也看不出来异常。。。。
如果你用unique 模型的话可能是因为不保证顺序,导致数据被replace 了。 routine load如果并发的话不同任务之间的执行是无序的,也就是说你kafka 中的 offset 靠前的数据并不一定先导入。 又由于是unique 模型,后面导入成功的数据会覆盖前面的数据,所以会产生你说的丢数据的错觉。
If you use the unique model, the data may be replaced because the order is not guaranteed. If routine load is concurrent, the execution of different tasks is disordered, which means that the data with the first offset in your Kafka is not necessarily imported first. Also, because it is a unique model, the data that is successfully imported later will overwrite the previous data, so the illusion of data loss will occur.
如果你用unique 模型的话可能是因为不保证顺序,导致数据被replace 了。 routine load如果并发的话不同任务之间的执行是无序的,也就是说你kafka 中的 offset 靠前的数据并不一定先导入。 又由于是unique 模型,后面导入成功的数据会覆盖前面的数据,所以会产生你说的丢数据的错觉。
If you use the unique model, the data may be replaced because the order is not guaranteed. If routine load is concurrent, the execution of different tasks is disordered, which means that the data with the first offset in your Kafka is not necessarily imported first. Also, because it is a unique model, the data that is successfully imported later will overwrite the previous data, so the illusion of data loss will occur.
如果一个routineload的话能保证分区顺序性吗?
如果你用unique 模型的话可能是因为不保证顺序,导致数据被replace 了。 routine load如果并发的话不同任务之间的执行是无序的,也就是说你kafka 中的 offset 靠前的数据并不一定先导入。 又由于是unique 模型,后面导入成功的数据会覆盖前面的数据,所以会产生你说的丢数据的错觉。 If you use the unique model, the data may be replaced because the order is not guaranteed. If routine load is concurrent, the execution of different tasks is disordered, which means that the data with the first offset in your Kafka is not necessarily imported first. Also, because it is a unique model, the data that is successfully imported later will overwrite the previous data, so the illusion of data loss will occur.
如果一个routineload的话能保证分区顺序性吗?
我用的是uniq,但是确认不是被replace了,丢失的数据整行都不在表里;
请问有后续吗 我也遇到了
请问有后续吗 我也遇到了
你是什么场景呢?有没啥规律啊
@EmmyMiao87 doris2.0.2 also meet this problem , can we find the root and fix it?
routine load任务如下:问题是偶尔丢失少量数据,但是重新提交一下此routine load 数据可以补回来。 CREATE ROUTINE LOAD xes1v1_db.ods_xes_platform_order_order_detail_bushu_4 ON ods_xes_platform_order_order_detail COLUMNS(
order_id
,product_id
,promotion_id
,id
,app_id
,user_id
,product_type
,product_name
,product_num
,product_price
,coupon_price
,promotion_price
,promotion_type
,parent_product_id
,parent_product_type
,source_id
,extras
,created_time
,updated_time
,version
,prepaid_card_price
,table
), wheretable
regexp 'orderdetail[0-9]' and app_id=8 and created_time>='2020-11-17 00:00:00' PROPERTIES ( "format" = "json", "jsonpaths" = "[ \"$.data.order_id\", \"$.data.product_id\", \"$.data.promotion_id\", \"$.data.id\", \"$.data.app_id\", \"$.data.user_id\", \"$.data.product_type\", \"$.data.product_name\", \"$.data.product_num\", \"$.data.product_price\", \"$.data.coupon_price\", \"$.data.promotion_price\", \"$.data.promotion_type\", \"$.data.parent_product_id\", \"$.data.parent_product_type\", \"$.data.source_id\", \"$.data.extras\", \"$.data.created_time\", \"$.data.updated_time\", \"$.data.version\", \"$.data.prepaid_card_price\", \"$.table\"]" ) FROM KAFKA ( "kafka_broker_list" = "10.20.34.60:9092,10.20.34.62:9092,10.20.34.64:9092", "kafka_topic" = "xes_plarform_order_4", "property.group.id" = "ods_xes_platform_order_order_detail_bushu", "property.client.id" = "ods_xes_platform_order_order_detail_bushu", "property.kafka_default_offsets" = "OFFSET_BEGINNING" );CREATE TABLE
ods_xes_platform_order_order_detail
(order_id
varchar(64) NULL DEFAULT "0" COMMENT "订单ID",product_id
int(11) NULL DEFAULT "0" COMMENT "商品ID",promotion_id
varchar(64) NULL DEFAULT "0" COMMENT "买赠/续报礼包规则id",id
varchar(64) NULL COMMENT "id",app_id
varchar(64) NULL DEFAULT "0" COMMENT "业务线ID",user_id
int(11) NULL DEFAULT "0" COMMENT "用户ID",product_type
int(11) NULL DEFAULT "0" COMMENT "商品类别",product_name
varchar(255) NULL DEFAULT "" COMMENT "商品名称",product_num
int(11) NULL DEFAULT "0" COMMENT "商品数量",product_price
int(11) NULL DEFAULT "0" COMMENT "商品销售金额",coupon_price
int(11) NULL DEFAULT "0" COMMENT "优惠券分摊金额",promotion_price
int(11) NULL DEFAULT "0" COMMENT "促销分摊金额",promotion_type
int(11) NULL DEFAULT "0" COMMENT "促销类型",parent_product_id
int(11) NULL DEFAULT "0" COMMENT "父商品ID",parent_product_type
int(11) NULL DEFAULT "0" COMMENT "父商品类别,业务线可自己定义",source_id
varchar(30) NULL DEFAULT "" COMMENT "热点数据",extras
varchar(3072) NULL DEFAULT "" COMMENT "订单商品中附属信息存储 比如促销的关键不变更信息存储之类的,不会来查询,不会用来检索",created_time
varchar(64) NULL DEFAULT "0000-00-00 00:00:00" COMMENT "创建时间",updated_time
varchar(64) NULL DEFAULT "1970-00-00 00:00:00" COMMENT "修改时间",version
varchar(64) NULL DEFAULT "" COMMENT "版本控制",prepaid_card_price
int(11) NULL DEFAULT "0" COMMENT "礼品卡金额",table
varchar(64) NULL DEFAULT "" COMMENT "来源表" ) ENGINE=OLAP UNIQUE KEY(order_id
,product_id
,promotion_id
) COMMENT "网校订单商品表" DISTRIBUTED BY HASH(order_id
) BUCKETS 10 PROPERTIES ( "replication_num" = "3", "in_memory" = "false", "storage_format" = "V2" );