Closed melanieihuei closed 6 years ago
A function that input textfile rdd (e.g. test_data_rdd = sc.textFile(testing_data)) and output a rdd of words inside EACH inputting file.
test_data_rdd = sc.textFile(testing_data)
([["w_1", "w_2", "w_3", ......, "w_d1"], ["w_1", "w_2", "w_3", ......, "w_d2"], ..., ["w_1", "w_2", "w_3", ......, "w_dk"]])
We will have to sc.broadcast() it and use .value() to call it in functions.
sc.broadcast()
.value()
A function that input textfile rdd (e.g.
test_data_rdd = sc.textFile(testing_data)
) and output a rdd of words inside EACH inputting file.We will have to
sc.broadcast()
it and use.value()
to call it in functions.