apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.1k stars 913 forks source link

[Bug] Lineage is empty when logical plan has no child #6706

Open vinayakmalik95 opened 1 month ago

vinayakmalik95 commented 1 month ago

Code of Conduct

Search before asking

Describe the bug

Lineage is returned as empty since we dont return the parentColumnLineage

but an empty AttributeSet()

case p if p.children.isEmpty => ListMap[Attribute, AttributeSet]() Screenshot 2024-09-19 at 6 18 23 PM

where as it should be

case p if p.children.isEmpty => parentColumnsLineage Screenshot 2024-09-19 at 6 17 02 PM

Affects Version(s)

https://github.com/apache/kyuubi/releases/tag/v1.9.2

Kyuubi Server Log Output

No error in server logs, you can test it simply by running test cases like

  test("columns lineage extract - AppendData/OverwriteByExpression") {
    val ddls =
      """
        |create table v2_catalog.db.tb0(col1 int, col2 string) partitioned by(col2)
        |""".stripMargin
    ddls.split("\n").filter(_.nonEmpty).foreach(spark.sql(_).collect())
    withTable("v2_catalog.db.tb0") { _ =>
      val ret0 =
        extractLineage(
          s"insert into table v2_catalog.db.tb0 " +
            s"select key as col1, value as col2 from test_db0.test_table0")
      assert(ret0 == Lineage(
        List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
        List("v2_catalog.db.tb0"),
        List(
          ("v2_catalog.db.tb0.col1", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
          ("v2_catalog.db.tb0.col2", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))

      val ret1 =
        extractLineage(
          s"insert overwrite table v2_catalog.db.tb0 partition(col2) " +
            s"select key as col1, value as col2 from test_db0.test_table0")
      assert(ret1 == Lineage(
        List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
        List("v2_catalog.db.tb0"),
        List(
          ("v2_catalog.db.tb0.col1", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
          ("v2_catalog.db.tb0.col2", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))

      val ret2 =
        extractLineage(
          s"insert overwrite table v2_catalog.db.tb0 partition(col2 = 'bb') " +
            s"select key as col1 from test_db0.test_table0")
      assert(ret2 == Lineage(
        List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
        List("v2_catalog.db.tb0"),
        List(
          ("v2_catalog.db.tb0.col1", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
          ("v2_catalog.db.tb0.col2", Set()))))
    }
  }

Kyuubi Engine Log Output

Empty Lineage returned : 

inputTables(List())
outputTables(List())
columnLineage(List())

### Whereas it should return lineage like
Lineage(
        List(s"$DEFAULT_CATALOG.test_db0.test_table0"),
        List("v2_catalog.db.tb0"),
        List(
          ("v2_catalog.db.tb0.col1", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.key")),
          ("v2_catalog.db.tb0.col2", Set(s"$DEFAULT_CATALOG.test_db0.test_table0.value")))))

this test fails 

### Test name :   test("columns lineage extract - AppendData/OverwriteByExpression")

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

By uddating the logic when

case p if p.children.isEmpty

we should return the parentColumnLineage instead of empty ListMap[Attribute, AttributeSet]()

Are you willing to submit PR?

github-actions[bot] commented 1 month ago

Hello @vinayakmalik95, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

vinayakmalik95 commented 1 month ago

This is present in all release since the inception of lineage in kyuubi.

@iodone @zwangsheng @bowenliang123 @cfmcgrady let me know so I can raise the PR

iodone commented 1 month ago

This is present in all releases since the inception of lineage in kyuubi.

@iodone @zwangsheng @bowenliang123 @cfmcgrady let me know so I can raise the PR

Can you run the unit test with your specific error case?

vinayakmalik95 commented 1 month ago

yes it fails with current unit test

@iodone

iodone commented 1 month ago

@Vinayakmalik95 You can try to fix it if the problem has been located.

vinayakmalik95 commented 1 month ago

I have already provided the solution in the comments above, let me open the PR for the same.