apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.3k stars 3.21k forks source link

The broker load is cancelled with error 'tablet writer writer failed' #1407

Closed EmmyMiao87 closed 4 years ago

EmmyMiao87 commented 5 years ago

Describe the bug The broker load is cancelled with reason 'tablet writer writer failed'. The olap engine find the exists (txn_id, load_id, tablet_id). It causes the tablet_writer_mgr failed when it try to write the row_batch by delta writer. Then the job is cancelled.

To Reproduce Steps to reproduce the behavior:

  1. create table

    CREATE TABLE `store_sales`
    (
    `ss_item_sk`            INT(11) NULL comment "",
    `ss_ticket_number`      INT(11) NULL comment "",
    `ss_sold_date_sk`       INT(11) REPLACE NULL comment "",
    `ss_sold_time_sk`       INT(11) REPLACE NULL comment "",
    `ss_customer_sk`        INT(11) REPLACE NULL comment "",
    `ss_cdemo_sk`           INT(11) REPLACE NULL comment "",
    `ss_hdemo_sk`           INT(11) REPLACE NULL comment "",
    `ss_addr_sk`            INT(11) REPLACE NULL comment "",
    `ss_store_sk`           INT(11) REPLACE NULL comment "",
    `ss_promo_sk`           INT(11) REPLACE NULL comment "",
    `ss_quantity`           INT(11) sum NULL comment "",
    `ss_wholesale_cost`     DECIMAL(7, 2) sum NULL comment "",
    `ss_list_price`         DECIMAL(7, 2) sum NULL comment "",
    `ss_sales_price`        DECIMAL(7, 2) sum NULL comment "",
    `ss_ext_discount_amt`   DECIMAL(7, 2) sum NULL comment "",
    `ss_ext_sales_price`    DECIMAL(7, 2) sum NULL comment "",
    `ss_ext_wholesale_cost` DECIMAL(7, 2) sum NULL comment "",
    `ss_ext_list_price`     DECIMAL(7, 2) sum NULL comment "",
    `ss_ext_tax`            DECIMAL(7, 2) sum NULL comment "",
    `ss_coupon_amt`         DECIMAL(7, 2) sum NULL comment "",
    `ss_net_paid`           DECIMAL(7, 2) sum NULL comment "",
    `ss_net_paid_inc_tax`   DECIMAL(7, 2) sum NULL comment "",
    `ss_net_profit`         DECIMAL(7, 2) sum NULL comment ""
    )
    AGGREGATE KEY(`ss_item_sk`, `ss_ticket_number`)
    DISTRIBUTED BY hash(`ss_item_sk`, `ss_ticket_number`)
    BUCKETS 100
    properties
    (
    "storage_type" = "COLUMN"
    );
  2. Load job by broker load (streaming)

    LOAD LABEL tpcds.store_sales(
    DATA INFILE("bos://xxx")
    INTO TABLE `store_sales` COLUMNS TERMINATED BY "|")
    WITH BROKER broker
    ("bos_endpoint" = "xxx",  "bos_accesskey" = "xxx", "bos_secret_accesskey"="xxx")
    properties ("max_filter_ratio" = "0.5", "strict_mode" = "false");

Expected behavior The duplicated request need to be skipped. The job should not be cancelled. It should skip it and continue loading.

Screenshots The log in be: W0627 23:32:39.164078 43315 olap_engine.cpp:954] find transaction exists when add to engine.partition_id: 12037, transaction_id: 2005, table: .12426.1782960977 W0627 23:32:39.164129 43315 tablet_writer_mgr.cpp:154] tablet writer writer failed, tablet_id=12426, transaction_id=2005 W0627 23:32:39.164191 43315 internal_service.cpp:107] tablet writer add batch failed, message=tablet writer write failed, id=9687a924433d42d0-8c57969ba6d4fe16, index_id=10416, sender_id=0

W0627 23:32:39.225698 43138 olap_table_sink.cpp:310] NodeChannel add row failed, load_id=9687a924433d42d0-8c57969ba6d4fe16, tablet_id=12366, node=10.227.96.17:8061, errmsg=tablet writer write failed W0627 23:32:39.225729 43138 olap_table_sink.cpp:310] NodeChannel add row failed, load_id=9687a924433d42d0-8c57969ba6d4fe16, tablet_id=12366, node=10.227.96.18:8061, errmsg=tablet writer write failed

EmmyMiao87 commented 5 years ago

Also, I close the compress of snappy.

EmmyMiao87 commented 4 years ago

@chaoyli has resolved this problem

452926826 commented 2 years ago

@EmmyMiao87 Excuse me, how did you solve this problem?