How join reorder works.

bithw1 commented 4 years ago

Hi @bartosz25 ,

Spark provides org.apache.spark.sql.catalyst.optimizer.ReorderJoin for more efficient join.

I notice that you have written 2 test cases about ReorderJoin in your article: https://www.waitingforcode.com/apache-spark-sql/spark-sql-operator-optimizations-part-2/read

But, I think these two test cases are kind of simple that don't illustrate the power of the ReorderJoin, especially auto reorder

Could you please write something more that explains how reorder join works from the source code view? Any more test cases would be better, :-)

I are reading the source of ReorderJoin, but can't get a good understanding...

Update： Spark SQL provides rule based join reorder and cost based join reorder.Above is about rule based join reorder

Thanks.

bartosz25 commented 4 years ago

Hi @bithw1 ,

Thank you for the suggestion! Indeed, the post you're quoting is quite old and I'm totally agree that it would be interesting to deep delve into that feature. I add it to my backlog and will try to publish either before my "What's new in Spark" series or after, all depend on the official release date :) As usual, I keep the issue open and let you know when the article is ready :)

Cheers, Bartosz.

bithw1 commented 4 years ago

Thanks @bartosz25, that's huge great! I have learnt a lot from your articles, thanks a lot for your effort and knowledge!

bartosz25 commented 4 years ago

Hi @bithw1

I published the first post from the series https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer/read.

I will add all of them under "Spark SQL reorder join" tag on this page https://www.waitingforcode.com/tags/spark-sql-reorder-join

Cheers, Bartosz.

bithw1 commented 4 years ago

Thanks @bartosz25 . I saw it. Many thanks to you for your great effort! I will learn and try

bartosz25 commented 4 years ago

You're welcome @bithw1 Thank you for an interesting topic :)

bartosz25 / spark-scala-playground

How join reorder works. #19