dsp-uga / andromeda

This repository contains a Naive Bayes classifier implemented on document classification which is completed on CSCI 8360, Data Science Practicum at the University of Georgia, Spring 2018.
MIT License
4 stars 1 forks source link

words_list(textfile_rdd) #23

Closed melanieihuei closed 6 years ago

melanieihuei commented 6 years ago

A function that input textfile rdd (e.g. test_data_rdd = sc.textFile(testing_data)) and output a rdd of words inside EACH inputting file.

([["w_1", "w_2", "w_3", ......, "w_d1"], ["w_1", "w_2", "w_3", ......, "w_d2"], ..., ["w_1", "w_2", "w_3", ......, "w_dk"]])

We will have to sc.broadcast() it and use .value() to call it in functions.