edublancas / song-lyrics

Exploratory Analysis of 200K+ song lyrics from the 1 million songs dataset
https://blancas.io/song-lyrics/
MIT License
5 stars 1 forks source link

Functions for bag of words representation #2

Closed edublancas closed 6 years ago

edublancas commented 6 years ago

First try to explore the data is to create a bag of words representation for each song, the data is already on {word: count} format but we need to convert it to numpy arrays, possibly also include an option to cut the number of words (will be hard to deal with the 5k words in the dataset)