Closed hugeshi closed 11 months ago
I see two things that are done incorrectly in your example:
No need to prefix Spline properties with spark.
in the core-site.xml
(this is only required when setting them in the Spark config):
<property>
<name>spline.lineageDispatcher.http.producer.url</name>
<value>http://10.27.184.4:8080/producer</value>
</property>
spark.sql.queryExecutionListeners
- you cannot set it in the core-site.xml
.
The core-site.xml
file is a Hadoop configuration file, and it's used to set Hadoop-specific settings. While Hadoop will read this file and load the properties it contains (making them available for Spline and other libraries), Spark itself does not use this file for its own configuration. You need to set this property in the spark-defaults.conf
or via --conf
command-line argument.
Thanks for your quick response. I will close the issue.
What's the issue:
Spline agent can't read the configuration from hadoop configuration core-site.xml when I set the spline configuration in hadoop configuration. When I set the spline configuration in the spark-defaults.conf, the lineage data can be generated.
I checked the spark UI, the Hadoop folder and spline jar could be displayed in the environment view as below:
Classpath Entries:
![image](https://github.com/AbsaOSS/spline-spark-agent/assets/6722878/f0f51fae-a81a-4fc7-89d3-bcf03ebeb7ce)
But the spline configuration(spark.sql.queryExecutionListeners) can't be displayed in the Spark Properties section.
Please help advise here, thanks.
How to reproduce:
Add the below properties in the core-site.xml
Spline Agent Version
1.3.0-SNAPSHOT