apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 954 forks source link

[Bug] Fail to create tag automatically with watermark #4481

Closed JackeyLee007 closed 1 week ago

JackeyLee007 commented 2 weeks ago

Search before asking

Paimon version

0.9

Compute Engine

Flink-1.18.1 Standalone cluster.

Minimal reproduce step

-- Target paimon table with tags create automatically from watermark 
create table orders_with_watermark(
    id INT PRIMARY KEY NOT ENFORCED, 
    val int, 
    create_time TIMESTAMP(3), 
    update_time TIMESTAMP(3),
    WATERMARK FOR `update_time` AS `update_time` - INTERVAL '5' SECOND  
  ) 
  WITH (
    'tag.automatic-creation' = 'watermark',
    'tag.creation-period' = 'hourly',
    'tag.creation-delay' = '10 m',
    'tag.num-retained-max' = '90',
    'sink.watermark-time-zone' = 'Asia/Shanghai'
);
-- mock datasource 
CREATE TEMPORARY TABLE datagen_table (
    id INT,
    name STRING,
    ts TIMESTAMP(3)
) WITH (
    'connector' = 'datagen',
    'rows-per-second' = '1',
    'fields.id.min' = '1',
    'fields.id.max' = '100',
    'fields.name.length' = '10',
    'fields.ts.kind' = 'random'
);
-- Run a stream job on Flink
INSERT INTO  orders_with_watermark
  select id, 1, now(), now() 
  from datagen_table;

What doesn't meet your expectations?

The tags should be created automatically every hour, but none. No tag files could be found under the table folder and also no result when executing the following sql

select * from `orders_with_watermark$tags` 

Anything else?

No response

Are you willing to submit a PR?

JingsongLi commented 1 week ago

First, you can check if the job is heath. Second, you can check the watermark in Flink topology, does the source produces the watermark normally.

JackeyLee007 commented 1 week ago

Sorry, it's my fault. Actually, the watermark should be emitted from the stream source which is the datagen_table for this case.

JackeyLee007 commented 1 week ago

Thanks for your hint. @JingsongLi