Open indajuan opened 7 years ago
Hi,
It should be an RDD[(Seq[Double], String)]. RDD because the operation is going to be run serially otherwise, and Seq to avoid unnecessary complications with the Vector class. Remember that you are not allowed to use the predefined KNN in Spark, but you have to implement it yourself. Give a look to the lecture examples, you can get it to work with very little adaptation.
Hi, I managed to import the csv.txt file in a Array[(org.apache.spark.mllib.linalg.Vector, String)] form. for example doing: iris(0)
I obtain: iris(0) = ([5.4,3.9,1.7,0.4],Iris-setosa)
How can I access my tuple to get the values inside the vector and to get the string (all inside the map environment)?