bartosz25 / spark-scala-playground

Sample processing code using Spark 2.1+ and Scala
50 stars 25 forks source link

About code gen in spark sql #12

Closed bithw1 closed 5 years ago

bithw1 commented 5 years ago

Hi @bartosz25

I am reading your post https://www.waitingforcode.com/apache-spark-sql/generated-code-spark-sql/read,

I would ask whether you'd like write something about the internal working of spark sql code generation,

  1. for the basic code generation(GenerateProjection, GenerateFiltering, GenerateOrdering), how are the mutable state variables of CodegenContext is used.

  2. How WholeStageCodeGeneration works

bartosz25 commented 5 years ago

Hi @bithw1 ,

Sure, it's a good idea. Especially that the post you quote became pretty old and deserves some refresh :) I think I should be able to publish it by the end of the year or in the 2 first weeks of January 2019.

Next 2 weeks I will focus on the GraphX (just published a small survey of distributed graph processing frameworks https://www.waitingforcode.com/graphs/graph-processing-frameworks-survey/read) and on the new features of Spark 2.4.0. Also I'll publish soon the posts about off-heap memory and multiple contexts in Apache Spark which are related to your previous questions.

Please let the issue open. I'll maybe ask you some specific questions about what you would like to see in the post.

Best regards, Bartosz.

bithw1 commented 5 years ago

Sure, thank you very much, @bartosz25, it is very kind of you.

bartosz25 commented 5 years ago

Hi @bithw1

I published a new post about the code generation in Apache Spark. I hope it'll answer at least a part your questions.

Best regards, Bartosz.

bithw1 commented 5 years ago

Thanks @bartosz25 !, it helps a lot.