SNICScienceCloud / LDSA-Spark

A collections of Apache Spark notebooks for the LDSA course
Apache License 2.0
0 stars 3 forks source link

Access vectors #6

Open indajuan opened 7 years ago

indajuan commented 7 years ago

Hi, I managed to import the csv.txt file in a Array[(org.apache.spark.mllib.linalg.Vector, String)] form. for example doing: iris(0)
I obtain: iris(0) = ([5.4,3.9,1.7,0.4],Iris-setosa)

How can I access my tuple to get the values inside the vector and to get the string (all inside the map environment)?

mcapuccini commented 7 years ago

Hi,

It should be an RDD[(Seq[Double], String)]. RDD because the operation is going to be run serially otherwise, and Seq to avoid unnecessary complications with the Vector class. Remember that you are not allowed to use the predefined KNN in Spark, but you have to implement it yourself. Give a look to the lecture examples, you can get it to work with very little adaptation.