Closed bithw1 closed 5 years ago
collect()
method will grab all data generated by executors to the driver. It means that the executors will need to serialize the data and send it through the network. Later the driver will need to deserialize it.
In your case, the first test initializes not serializable MyLabelMaker
instance on the executor side and returns a String from the map transformation. Since String is serializable, the test passes. The second test fails because mapped type is not serializable. You can add an extends Serializable
to MyLabelMaker to see that it's the reason.
Thanks @bartosz25 .
After reading your answer, I changed collect to count , and it has no serialization issue now, and I understood: I have thought the serialization of these two test cases from driver to executor but forgot to think the opposite: from executor to driver. (><)
Thank you @bartosz25 ,you have very solid knowledge, :100: These days I have been following your articles and learned a lot.
Hi @bartosz25
Following code is deduced from your post(http://www.waitingforcode.com/apache-spark/serialization-issues-part-1/read), and thank you for these great article!
The second test case fails while the first one succeeds, I don't understand the result,looks it involves closure serialization mechanism? but I am not very clear... @bartosz25 could you please take a look?