bartosz25 / spark-scala-playground

Sample processing code using Spark 2.1+ and Scala
50 stars 25 forks source link

One question in the test case in LazyLoadingTest #7

Closed bithw1 closed 5 years ago

bithw1 commented 5 years ago

Hi @bartosz25

I reading your http://www.waitingforcode.com/apache-spark/serialization-issues-part-1/read

There is a test case:

"lazy loaded not serializable object" should "be correctly sent once through network" in {
  val numbersAccumulator = sparkContext.collectionAccumulator[Int]("iterated numbers accumulator")
  // This version is a variation of the previous test because it
  // sends given object only once and thanks to that we can, for example,
  // keep the connection open
  val connectorBroadcast = sparkContext.broadcast(NotSerializableLazyConnector())
  sparkContext.parallelize(0 to 1)
    .foreachPartition(numbers => {
      numbers.foreach(number => {
        connectorBroadcast.value.push(number)
        numbersAccumulator.add(number)
      })
    })

  numbersAccumulator.value should contain allOf(0, 1)
}

What do you mean by the comment in the test code:

` // This version is a variation of the previous test because it // sends given object only once and thanks to that we can, for example, // keep the connection open

`

I don' t know what you mean by keep the connection open, I can't guess what's the scenario you are thinking about.

bartosz25 commented 5 years ago

Thanks you for asking. I pushed a clarification in this commit https://github.com/bartosz25/spark-scala-playground/commit/824d9fe58fc10c98a3009f90879e12ebcd995e4f

bithw1 commented 5 years ago

Thanks @bartosz25