bartosz25 / spark-scala-playground

Sample processing code using Spark 2.1+ and Scala
50 stars 26 forks source link

About multiple SparkSessions that share the same SparkContext #4

Closed bithw1 closed 6 years ago

bithw1 commented 6 years ago

Hi @bartosz25

In Spark 2.x, we can create more than one spark session that shares the same underlying spark context, I would ask what's the practical or typical usage of creating many spark sessions, the SparkSession.newSession says Start a new session with isolated SQL configurations, temporary tables, registered functions are isolated, but sharing the underlying SparkContext and cached data.,but I didn't get a good understanding when to use this feature

Could you please write something about this? Thanks you, @bartosz25

bartosz25 commented 6 years ago

Thanks for the message. I add the idea to my TODO list. I can't promise an exact publication date but will try to write something the 3rd week of October.

bithw1 commented 6 years ago

Thanks @bartosz25 . Don't work too hard,please, :-), If you have the spare time and interest, then write sth ,or just put it in backlog,:-)

bartosz25 commented 5 years ago

Hi @bithw1 , I've published a post about multiple SparkSessions in a single driver process: https://www.waitingforcode.com/sql/multiple-sparksession-one-sparkcontext/read After some research I think that it's always better to isolate the processes because of the monitoring and reprocessing facility,.

After all depends on the final use case and I would agree that sometimes having both Datasets in the single driver process seems attractive because of the possibility to store the first one in memory and just to pass it to the second. That being said in this situation, you could also output the former one to an in-memory store and use it as the input for the second.

Best regards, Bartosz.

bithw1 commented 5 years ago

Thanks very much, @bartosz25 I will check it out, and will let you know if I have questions.

bithw1 commented 5 years ago

Hi @bartosz25 , thanks for providing two scenarios that multiple spark sessions maybe helpful. I also briefly look for the usage of SparkSession#newSession, and find that multiple spark sessions are used in the spark thrift server. The spark server will kick off a new SparkSession for each new submitted sql query(SparkSQLSessionManager#openSession) while the server keeps only one SparkContext,

bartosz25 commented 5 years ago

Thanks for your feedback and finding :+1:

Best regards, Bartosz.