bartosz25 / spark-scala-playground

Sample processing code using Spark 2.1+ and Scala
50 stars 25 forks source link

How to debug spark sql generated code? #21

Closed bithw1 closed 4 years ago

bithw1 commented 4 years ago

Hi @bartosz25 ,

I am reading your article https://www.waitingforcode.com/apache-spark-sql/generated-code-spark-sql/read you said Thanks to the same trick of debugging we can see that reference array brings filter to apply

I would ask you how to setup the environment to debug spark sql generated code, I am using intellij, Thansk!

bartosz25 commented 4 years ago

Hi @bithw1 ,

Ah yes, my old and unclear articles ;) I will add a clarification in the blog post. Regarding the debugging, I use the one provided with IntelliJ.

When I want to learn something new, I added the hard breakpoints that will stop the program at its execution. It helps me to see the execution flow, analyze generated objects, flow controls etc. On the other hand, when I work on the code I've already seen somewhere in the past, I use soft debugging, ie. I print messages instead of stopping the program.

I covered these methods in this blog post: https://www.waitingforcode.com/programming/tips-discover-internals-open-source-framework-internals-apache-spark-use-case/read

Hope it helps.

Cheers, Bartosz.

bithw1 commented 4 years ago

Hi @bartosz25,

The blog post about debugging skill is great! But it looks to me that the methods you covered there are not suitable to debug spark sql auto generated code?

When spark sql generated code like the one in https://www.waitingforcode.com/apache-spark-sql/generated-code-spark-sql/read#predicate

I am only able to log these generated code into log file, and read by eyes what the code would do, But i can't debug it line by line(see the variable values, etc).

So, I am not sure how you debug it, :-)

bartosz25 commented 4 years ago

Sorry @bithw1 , I missed this issue.

In fact, to debug the generated code I use the same techniques as the ones listed in the blog post. The difference is that I add the breakpoints to Apache Spark methods invoked directly by the generated code. Unfortunately, I didn't find a way to add breakpoints to the generated code :(

Best, Bartosz.

bithw1 commented 4 years ago

Sure,thanks @bartosz25 , understood how you debug now, :-), that's a good way.

I think there should be a way to debug janino compiled code, but I didn't find the way, either...