AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
185 stars 94 forks source link

Spline- support of parsing of expressions and lineage on column level out of expression parsing #555

Closed zacayd closed 1 year ago

zacayd commented 1 year ago

Hi i have run some tests on the spline api and also reviewed the collections on the ArangoDB but didnt find any reference to the parsing of expression. i.e: i have in the pySpark Code this line: df=df.withColumn("CXX", col("lat")) df.createOrReplaceTempView("Data") df1=spark.sql("select *from Data") df1.write.mode("overwrite").csv("mycsv.csv")

and expects that in the metadata collection- i will have any indication that CXX column is the target of lat is this something that spline supports? if not - can it be customised on the spline agent code? Thanks Zacay

cerveada commented 1 year ago

Both Expressions and Attributes/Columns are supported. You can see the Spline data model here: https://github.com/AbsaOSS/spline/wiki/Spline-Data-Model

To understand how Spline works I would recommend to check and run the examples https://github.com/AbsaOSS/spline-spark-agent/tree/develop/examples

And look at the result on UI. You should be able to find Attributes and Expressions there.

zacayd commented 1 year ago

Thanks doent it also supports UDF - User defined fucntions on spark?

cerveada commented 1 year ago

Partially, see this ticket: https://github.com/AbsaOSS/spline-spark-agent/issues/181