apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.38k stars 3.22k forks source link

[Bug] For Stream load, partial column import is used, and the default value for the 'current_timestamp' column is the table creation date. #30343

Open YS0mind opened 8 months ago

YS0mind commented 8 months ago
          I also find this problem when I load data into Doris.
  1. Create table in doris to test like this:

    create table data_province
    (
    `run_date`         date           not null comment '日期',
    data_type_id       int            not null comment '数据类型',
    data               decimal(24, 8) not null comment '数据值',
    create_time        datetime       not null default current_timestamp comment '创建时间'
    )
    engine = olap unique key(`run_date`,`data_type_id`)
    comment "分日数据表"
    partition by range(`run_date`) ( )
    distributed by hash(`data_type_id`) buckets 10
    properties (
        "storage_format" = "V2",
        "enable_unique_key_merge_on_write" = "true",
        "dynamic_partition.enable" = "true",
        "dynamic_partition.time_unit" = "month",
        "dynamic_partition.create_history_partition" = "true",
        "dynamic_partition.history_partition_num" = "10",
        "dynamic_partition.start" = "-6",
        "dynamic_partition.end" = "3",
        "dynamic_partition.prefix" = "p",
        "dynamic_partition.replication_num" = "1",
        "dynamic_partition.buckets" = "10"
    );
  2. Insert data by stream load like this,and you can find that field "create_time" is the time you create table,however,when I try insert data by insert-into method,everything is ok,the field "create_time" is the time I insert record.

    # stream load导入 默认的时间固定为建表时间,insert into方式则会是正常的插入记录的时间
    # vim /tmp/test.csv
    # 2024-01-01,1,67200.00000000
    curl --location-trusted -u root \
    -H "partial_columns:true" \
    -H "column_separator:," \
    -H "columns:run_date,data_type_id,data" \
    -H "two_phase_commit:false" \
    -H "label:stream_load_test01" \
    -T /tmp/test.csv http://127.0.0.1:8030/api/iotest/data_province/_stream_load
  3. If I import by specifying -H "columns: current_timestamp()",the field "create_time" is the time I insert record,but when I insert new record with the same key field,this filed will change.I just want to save the time I create this record.

    # 部分列导入可以生成正确的默认时间,但每一次相同key的记录导入会把
    # create_time也覆盖成最新的时间
    curl --location-trusted -u root \
    -H "partial_columns:true" \
    -H "column_separator:," \
    -H "columns:run_date,data_type_id,data,create_time=current_timestamp()" \
    -H "two_phase_commit:false" \
    -H "label:stream_load_test02" \
    -T /tmp/test.csv http://127.0.0.1:8030/api/iotest/dwd_rd_data_province/_stream_load

Originally posted by @YS0mind in https://github.com/apache/doris-flink-connector/issues/191#issuecomment-1905753329

zhbdesign commented 8 months ago

streamload的导入的默认时间不知道解决没,他是在生成计划的时候生成的默认时间,长连接会有问题

15767714253 commented 2 months ago

还没解决?

marderary commented 1 month ago

tracking this bug...