The goal of the project was to predict which songs a person will like based on songs they have previously liked or disliked, using features of the songs themselves. The dataset was created by pulling song information from a Spotify REST API, and manually classifying the songs into "Like" and "Not Like" categories.
I really like that you created your own dataset by pulling from a REST API rather than just using a premade dataset and that you detailed how you did it in section 1 in case someone wanted to follow your procedure. This was creative and thorough. Also, nice job trying a lot of different types of models, and explaning the models that we had not fully covered in class. Thirdly, I liked that you created your own error metric to reflect the error that you really cared about. You also did a good job justifying it, which was essential since you were using something that we had not talked about in class.
A couple small concerns. The first is about the size of your dataset. In general, only having a few hundred samples seems small to me. However, even your smallest dataset with only 300 positive and 300 negative examples is still larger than I think most real datasets would be (maybe I am unusual, but I have definitely not liked and disliked a collective 600 songs on my Spotify). On the other hand, that smallest dataset had the lowest error, so maybe it is okay. Additionally, I am wondering how you separated your songs into like and dislike playlists (whether it was random, or perhaps based on the preferences of one of the people in your group). If it was random, I think that might affect how well your models are able to learn the data by making it more difficult to find a pattern. Finally, I would also have liked to see you parse your final error measurement a little more in the results and conclusion section. Since you had a composite error metric, elaborating on the 70% to tell the reader how the number of true positives compared to the number of true negatives would have been appreciated. This is especially true since you talked so much about this trade-off throughout your report.
By the way, your analysis of Figure 6 is not entirely correct. In particular, the part where you compare Rock and Mellow songs appears to be wrong.
Overall, though, great job. The project showed a lot of initiative and creativity.
The goal of the project was to predict which songs a person will like based on songs they have previously liked or disliked, using features of the songs themselves. The dataset was created by pulling song information from a Spotify REST API, and manually classifying the songs into "Like" and "Not Like" categories.
I really like that you created your own dataset by pulling from a REST API rather than just using a premade dataset and that you detailed how you did it in section 1 in case someone wanted to follow your procedure. This was creative and thorough. Also, nice job trying a lot of different types of models, and explaning the models that we had not fully covered in class. Thirdly, I liked that you created your own error metric to reflect the error that you really cared about. You also did a good job justifying it, which was essential since you were using something that we had not talked about in class.
A couple small concerns. The first is about the size of your dataset. In general, only having a few hundred samples seems small to me. However, even your smallest dataset with only 300 positive and 300 negative examples is still larger than I think most real datasets would be (maybe I am unusual, but I have definitely not liked and disliked a collective 600 songs on my Spotify). On the other hand, that smallest dataset had the lowest error, so maybe it is okay. Additionally, I am wondering how you separated your songs into like and dislike playlists (whether it was random, or perhaps based on the preferences of one of the people in your group). If it was random, I think that might affect how well your models are able to learn the data by making it more difficult to find a pattern. Finally, I would also have liked to see you parse your final error measurement a little more in the results and conclusion section. Since you had a composite error metric, elaborating on the 70% to tell the reader how the number of true positives compared to the number of true negatives would have been appreciated. This is especially true since you talked so much about this trade-off throughout your report.
By the way, your analysis of Figure 6 is not entirely correct. In particular, the part where you compare Rock and Mellow songs appears to be wrong.
Overall, though, great job. The project showed a lot of initiative and creativity.