apache / doris-flink-connector

Flink Connector for Apache Doris
https://doris.apache.org/
Apache License 2.0
330 stars 227 forks source link

[Bug] doris默认值列存在时间差问题 #191

Open zhbdesign opened 1 year ago

zhbdesign commented 1 year ago

Search before asking

Version

flink-1.15.2,doris-2.0.1-rc03

What's Wrong?

通过flink消费数据插入到doris,doris有一列为default current_timestamp(3),查询数据发现doris默认值数据的时间比业务时间小,这不符合常规逻辑

What You Expected?

Doris默认值时间要比业务时间大

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Code of Conduct

victoyzz commented 1 year ago

请问你有没有遇到过,默认值不生效的问题?

zhbdesign commented 1 year ago

请问你有没有遇到过,默认值不生效的问题?

2.0对默认时间是遇到问题,具体忘记了,现在还在等待修复

YS0mind commented 10 months ago

I also find this problem when I load data into Doris.

  1. Create table in doris to test like this:

    create table data_province
    (
    `run_date`         date           not null comment '日期',
    data_type_id       int            not null comment '数据类型',
    data               decimal(24, 8) not null comment '数据值',
    create_time        datetime       not null default current_timestamp comment '创建时间'
    )
    engine = olap unique key(`run_date`,`data_type_id`)
    comment "分日数据表"
    partition by range(`run_date`) ( )
    distributed by hash(`data_type_id`) buckets 10
    properties (
        "storage_format" = "V2",
        "enable_unique_key_merge_on_write" = "true",
        "dynamic_partition.enable" = "true",
        "dynamic_partition.time_unit" = "month",
        "dynamic_partition.create_history_partition" = "true",
        "dynamic_partition.history_partition_num" = "10",
        "dynamic_partition.start" = "-6",
        "dynamic_partition.end" = "3",
        "dynamic_partition.prefix" = "p",
        "dynamic_partition.replication_num" = "1",
        "dynamic_partition.buckets" = "10"
    );
  2. Insert data by stream load like this,and you can find that field "create_time" is the time you create table,however,when I try insert data by insert-into method,everything is ok,the field "create_time" is the time I insert record.

    # stream load导入 默认的时间固定为建表时间,insert into方式则会是正常的插入记录的时间
    # vim /tmp/test.csv
    # 2024-01-01,1,67200.00000000
    curl --location-trusted -u root \
    -H "partial_columns:true" \
    -H "column_separator:," \
    -H "columns:run_date,data_type_id,data" \
    -H "two_phase_commit:false" \
    -H "label:stream_load_test01" \
    -T /tmp/test.csv http://127.0.0.1:8030/api/iotest/data_province/_stream_load
  3. If I import by specifying -H "columns: current_timestamp()",the field "create_time" is the time I insert record,but when I insert new record with the same key field,this filed will change.I just want to save the time I create this record.

    # 部分列导入可以生成正确的默认时间,但每一次相同key的记录导入会把
    # create_time也覆盖成最新的时间
    curl --location-trusted -u root \
    -H "partial_columns:true" \
    -H "column_separator:," \
    -H "columns:run_date,data_type_id,data,create_time=current_timestamp()" \
    -H "two_phase_commit:false" \
    -H "label:stream_load_test02" \
    -T /tmp/test.csv http://127.0.0.1:8030/api/iotest/dwd_rd_data_province/_stream_load
JNSimba commented 4 months ago

Flink doris connector is a long link by default, which may cause this problem. You can use bathc_mode to avoid this problem. image