HamaWhiteGG / flink-sql-lineage

The Lineage Analysis system for FlinkSQL supports advanced syntax such as Watermark, UDTF, CEP, Windowing TVFs, and CTAS.
Apache License 2.0
369 stars 166 forks source link

当where中包含字段等于常量时,会把源字段优化成常量,导致丢失改字段的血缘关系 #32

Closed aiwenmo closed 1 year ago

aiwenmo commented 1 year ago

当输入以下 FlinkSQL 时: ` CREATE TABLE demo_log_01 (     user_id BIGINT,     item_id BIGINT,     behavior STRING,     dt STRING,     hh STRING ) WITH (   'connector' = 'datagen',   'rows-per-second' = '1' ); CREATE TABLE demo_log_02 (     user_id BIGINT,     item_id BIGINT,     behavior STRING,     dt STRING,     hh STRING ) WITH (   'connector' = 'datagen',   'rows-per-second' = '1' ); CREATE TABLE demo_log_05 (     user_id BIGINT,     item_id BIGINT,     behavior STRING,     dt STRING,     hh STRING ) WITH (   'connector' = 'print' ); insert into demo_log_05 select a.user_id,b.item_id,b.behavior,a.dt,a.hh from  (select  user_id,item_id,behavior,dt,hh from demo_log_01 where dt='2023-17-02') a  left join  (select  user_id,item_id,behavior,dt,hh from demo_log_02) b on a.user_id = b.user_id

`

aiwenmo commented 1 year ago

获得错误的血缘关系图: image

aiwenmo commented 1 year ago

当修改等于为大于时,即: (select user_id,item_id,behavior,dt,hh from demo_log_01 where dt>'2023-17-02') a 获得正确的血缘关系图: image

aiwenmo commented 1 year ago

原因是在where里dt等于常量,然后优化器直接是把常量插入到demo_log_05表。 本质来说demo_log_05,dt 是来自于一个常量的。 image

HamaWhiteGG commented 1 year ago

you can see the test case https://github.com/HamaWhiteGG/flink-sql-lineage/blob/release-4.0.0/lineage-flink1.16.x/src/test/java/com/hw/lineage/flink/paimon/PaimonTest.java

99953205-32C4-40BF-8949-2DB2F10834B6