bartosz25 / spark-scala-playground

Sample processing code using Spark 2.1+ and Scala
50 stars 25 forks source link

Bug in #22

Closed bithw1 closed 4 years ago

bithw1 commented 4 years ago

Hi @bartosz25,

I am investigating your article: https://www.waitingforcode.com/apache-spark-sql/writing-custom-optimization-apache-spark-sql-union-rewriter-mvp-version/read,

I think there is a bug in the following code:

          val matchedRow = leftRows.getOrElse(rightRows.get).toSeq.head
          val (letter, nr, flag) = if (matchedRow.isNullAt(0)) {
            (matchedRow.getUTF8String(3), matchedRow.getInt(4), matchedRow.getInt(5))
          } else {
            (matchedRow.getUTF8String(0), matchedRow.getInt(1), matchedRow.getInt(2))
          }

The matchedRow has only three fields, so it is incorrect to getUTF8String(3), getInt(4) and getInt(5). I think following code is enough since you have got matchedRow either from left rdd or right rdd val (letter, nr, flag) = (matchedRow.getUTF8String(0), matchedRow.getInt(1), matchedRow.getInt(2))

bartosz25 commented 4 years ago

Hi @bithw1

You're right! Thank you. I fixed it in https://github.com/bartosz25/spark-scala-playground/commit/3508c13a499663650e7441ae02df8113a8be97e2

It worked before because the first condition was always false, so it was going to the else part of the branch.

Best, Bartosz.

bithw1 commented 4 years ago

Thanks @bartosz25, you are right.

I am closing this issue.