Open bithw1 opened 4 years ago
Hi @bithw1 ,
Thank you for the suggestion! Indeed, the post you're quoting is quite old and I'm totally agree that it would be interesting to deep delve into that feature. I add it to my backlog and will try to publish either before my "What's new in Spark" series or after, all depend on the official release date :) As usual, I keep the issue open and let you know when the article is ready :)
Cheers, Bartosz.
Thanks @bartosz25, that's huge great! I have learnt a lot from your articles, thanks a lot for your effort and knowledge!
Hi @bithw1
I published the first post from the series https://www.waitingforcode.com/apache-spark-sql/reorder-join-optimizer/read.
I will add all of them under "Spark SQL reorder join" tag on this page https://www.waitingforcode.com/tags/spark-sql-reorder-join
Cheers, Bartosz.
Thanks @bartosz25 . I saw it. Many thanks to you for your great effort! I will learn and try
You're welcome @bithw1 Thank you for an interesting topic :)
Hi @bartosz25 ,
Spark provides
org.apache.spark.sql.catalyst.optimizer.ReorderJoin
for more efficient join.I notice that you have written 2 test cases about ReorderJoin in your article: https://www.waitingforcode.com/apache-spark-sql/spark-sql-operator-optimizations-part-2/read
But, I think these two test cases are kind of simple that don't illustrate the power of the ReorderJoin, especially auto reorder
Could you please write something more that explains how reorder join works from the source code view? Any more test cases would be better, :-)
I are reading the source of ReorderJoin, but can't get a good understanding...
Update: Spark SQL provides rule based join reorder and cost based join reorder.Above is about rule based join reorder
Thanks.