In databricks, a new SparkContext cannot be created. The spark application name extracted in harvester is done using sparkcontext (ctx.session.sparkContext.appName). With this approach, in Databricks, its always "Databricks Shell". This is limiting us to extract and persist the lineage if two different plans are extracted for same destination( file/table). We are using output of spline agent and have written a parser on top of it to store the lineage.
Instead of relying on ctx.session.sparkContext.appName, can you please make a change to extract Application Name using spark.conf.get("spark.app.name") which should work both for OpenSource as well as databricks spark , considering sparkContext is also bit of legacy the way properties are set now ?
Is there any other way to extract valid spark app Name or add additional config to the plan with codeless init if above proposed method does not work?
In databricks, a new SparkContext cannot be created. The spark application name extracted in harvester is done using sparkcontext (
ctx.session.sparkContext.appName
). With this approach, in Databricks, its always "Databricks Shell". This is limiting us to extract and persist the lineage if two different plans are extracted for same destination( file/table). We are using output of spline agent and have written a parser on top of it to store the lineage.Instead of relying on
ctx.session.sparkContext.appName
, can you please make a change to extract Application Name usingspark.conf.get("spark.app.name")
which should work both for OpenSource as well as databricks spark , considering sparkContext is also bit of legacy the way properties are set now ?Is there any other way to extract valid spark app Name or add additional config to the plan with codeless init if above proposed method does not work?