AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
185 stars 95 forks source link

get App Name using `spark.app.name` #653

Closed uday1409 closed 1 year ago

uday1409 commented 1 year ago

In databricks, a new SparkContext cannot be created. The spark application name extracted in harvester is done using sparkcontext (ctx.session.sparkContext.appName). With this approach, in Databricks, its always "Databricks Shell". This is limiting us to extract and persist the lineage if two different plans are extracted for same destination( file/table). We are using output of spline agent and have written a parser on top of it to store the lineage.

Instead of relying on ctx.session.sparkContext.appName, can you please make a change to extract Application Name using spark.conf.get("spark.app.name") which should work both for OpenSource as well as databricks spark , considering sparkContext is also bit of legacy the way properties are set now ?

image

Is there any other way to extract valid spark app Name or add additional config to the plan with codeless init if above proposed method does not work?

freds72 commented 1 year ago

upvoted

wajda commented 1 year ago

thanks for contribution @uday1409