Closed edublancas closed 6 years ago
Was able to run the scripts successfully. Thanks, @edublancas!
I don't see artist location anywhere in the updated dataset. Am I missing something?
I probably missed something in the scripts. Will fix them now.
Just fixed the error, thanks for letting me know. In order to get the location please re-run export_track_metadata
(I also updated the bootstrap
script) and then the join
scripts.
Is this complete? If so, I'll run my (hopefully) final topic model on the data
I need to make some changes, working on it now
@aaronsadholz I pushed the updated code: cleaning artist_name, artist_id and adding language.
Since computing language takes a while I uploaded the output to Google Drive (the one that José shared). So you only need to run the script starting on line 51
I updated the repo so we all can work on the same dataset, these are the steps to follow:
get_data
script was updated)./bootstrap
bootstrap
contains the code that was previously on the README file, the final output are three files:bag_of_words.feather
this is our main dataset: all words (stopwords removed) + metadatabag_of_words_top_1000.feather
, same as 1. but only top 1k most popular wordsbag_of_words_top_1000_normalized.feather
, same as 1. but only top 1k most popular words and normalizedembeddings.feather
, embeddings (dense vectors) for every song (50 dimensions)We are all going to be using mostly 1. 2, 3 and 4 are for seeing if those smaller representations help with the topic modeling, clustering, measuring similarity. So probably just @aaronsadholz and me need those. But in any case, @jose-alvarado-guzman and @valmikkpatel: feel free to explore those datasets as well.
Let me know if you have any trouble running the scripts, hopefully we can all get this done before our next meeting on saturday.