apache / doris-flink-connector

Flink Connector for Apache Doris
https://doris.apache.org/
Apache License 2.0
315 stars 221 forks source link

[Bug] doris默认值列存在时间差问题 #191

Open zhbdesign opened 1 year ago

zhbdesign commented 1 year ago

Search before asking

Version

flink-1.15.2,doris-2.0.1-rc03

What's Wrong?

通过flink消费数据插入到doris,doris有一列为default current_timestamp(3),查询数据发现doris默认值数据的时间比业务时间小,这不符合常规逻辑

What You Expected?

Doris默认值时间要比业务时间大

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Code of Conduct

victoyzz commented 11 months ago

请问你有没有遇到过,默认值不生效的问题?

zhbdesign commented 11 months ago

请问你有没有遇到过,默认值不生效的问题?

2.0对默认时间是遇到问题,具体忘记了,现在还在等待修复

YS0mind commented 8 months ago

I also find this problem when I load data into Doris.

  1. Create table in doris to test like this:

    create table data_province
    (
    `run_date`         date           not null comment '日期',
    data_type_id       int            not null comment '数据类型',
    data               decimal(24, 8) not null comment '数据值',
    create_time        datetime       not null default current_timestamp comment '创建时间'
    )
    engine = olap unique key(`run_date`,`data_type_id`)
    comment "分日数据表"
    partition by range(`run_date`) ( )
    distributed by hash(`data_type_id`) buckets 10
    properties (
        "storage_format" = "V2",
        "enable_unique_key_merge_on_write" = "true",
        "dynamic_partition.enable" = "true",
        "dynamic_partition.time_unit" = "month",
        "dynamic_partition.create_history_partition" = "true",
        "dynamic_partition.history_partition_num" = "10",
        "dynamic_partition.start" = "-6",
        "dynamic_partition.end" = "3",
        "dynamic_partition.prefix" = "p",
        "dynamic_partition.replication_num" = "1",
        "dynamic_partition.buckets" = "10"
    );
  2. Insert data by stream load like this,and you can find that field "create_time" is the time you create table,however,when I try insert data by insert-into method,everything is ok,the field "create_time" is the time I insert record.

    # stream load导入 默认的时间固定为建表时间,insert into方式则会是正常的插入记录的时间
    # vim /tmp/test.csv
    # 2024-01-01,1,67200.00000000
    curl --location-trusted -u root \
    -H "partial_columns:true" \
    -H "column_separator:," \
    -H "columns:run_date,data_type_id,data" \
    -H "two_phase_commit:false" \
    -H "label:stream_load_test01" \
    -T /tmp/test.csv http://127.0.0.1:8030/api/iotest/data_province/_stream_load
  3. If I import by specifying -H "columns: current_timestamp()",the field "create_time" is the time I insert record,but when I insert new record with the same key field,this filed will change.I just want to save the time I create this record.

    # 部分列导入可以生成正确的默认时间,但每一次相同key的记录导入会把
    # create_time也覆盖成最新的时间
    curl --location-trusted -u root \
    -H "partial_columns:true" \
    -H "column_separator:," \
    -H "columns:run_date,data_type_id,data,create_time=current_timestamp()" \
    -H "two_phase_commit:false" \
    -H "label:stream_load_test02" \
    -T /tmp/test.csv http://127.0.0.1:8030/api/iotest/dwd_rd_data_province/_stream_load
JNSimba commented 2 months ago

Flink doris connector is a long link by default, which may cause this problem. You can use bathc_mode to avoid this problem. image