Bug in - Githubissues

bithw1 commented 4 years ago

Hi @bartosz25，

I am investigating your article: https://www.waitingforcode.com/apache-spark-sql/writing-custom-optimization-apache-spark-sql-union-rewriter-mvp-version/read,

I think there is a bug in the following code:

          val matchedRow = leftRows.getOrElse(rightRows.get).toSeq.head
          val (letter, nr, flag) = if (matchedRow.isNullAt(0)) {
            (matchedRow.getUTF8String(3), matchedRow.getInt(4), matchedRow.getInt(5))
          } else {
            (matchedRow.getUTF8String(0), matchedRow.getInt(1), matchedRow.getInt(2))
          }

The matchedRow has only three fields, so it is incorrect to getUTF8String(3), getInt(4) and getInt(5). I think following code is enough since you have got matchedRow either from left rdd or right rdd val (letter, nr, flag) = (matchedRow.getUTF8String(0), matchedRow.getInt(1), matchedRow.getInt(2))

bartosz25 commented 4 years ago

Hi @bithw1

You're right! Thank you. I fixed it in https://github.com/bartosz25/spark-scala-playground/commit/3508c13a499663650e7441ae02df8113a8be97e2

It worked before because the first condition was always false, so it was going to the else part of the branch.

Best, Bartosz.

bithw1 commented 4 years ago

Thanks @bartosz25, you are right.

I am closing this issue.

bartosz25 / spark-scala-playground

Bug in #22