Application to maintain a Spotify database, generate listening reports and build playlists. Harvard CS50 final project.
My final CS50 project is an application suite developed in Phyton that assists the Spotify power user. It gives users access to the extensive Spotify for developers' API, taking the Spotify experience to a whole new level.
It uses the following Python libraries:
Further analysis of data through visualizations in
The application relational database has the following schema:
CREATE TABLE tracks (id VARCHAR(62), song TEXT, album TEXT, release VARCHAR(32), popularity INTEGER,
main_artist_id TEXT, main_genre TEXT, acousticness REAL, danceability REAL, energy REAL,
instrumentalness REAL, liveness REAL, loudness REAL, speechiness REAL, valence REAL,
tempo REAL, duration INTEGER, PRIMARY KEY(id))
CREATE TABLE history(id VARCHAR(62), timestamp VARCHAR(24) PRIMARY KEY)
CREATE TABLE saved_tracks(id VARCHAR(62) PRIMARY KEY)
CREATE TABLE tracks_artists(id VARCHAR(62), artist_id VARCHAR(62), PRIMARY KEY(id,artist_id))
CREATE TABLE artists(artist_id VARCHAR(62) PRIMARY KEY, artist TEXT, image_url TEXT, popularity INT)
CREATE TABLE genres(artist_id VARCHAR(62), genre TEXT, PRIMARY KEY(artist_id, genre))
Populating the database for the first time requires the following steps:
The user should request her/his Spotify user data (On Spotify account under privacy settings). Spotify will email a zip file containing -among other files- 12 months of listening history in JSON format.
Running saved_songs_upload.py. The python program accesses the users liked tracks in the Spotify API and populates the tables as follows:
Running search_song_info.py Creating listening history from JSON data provided by Spotify via email as requested in step 1.
{
"endTime" : "2020-05-30 18:40",
"artistName" : "Bruce Springsteen",
"trackName" : "Thunder Road",
"msPlayed" : 288720
},
Unfortunately the JSON file does not contain the track id, so the program needs to find it. First the program checks if a track with the same name and artists exists in the saved tracks table. If it is already in saved tracks, then it uses the existing track id. Else, conducts same query on the Spotify API and then searches the API for the rest of the information necessary to populate the tables. The syntax in the spotipy library to find a track ID is:
q = ("artist:%s track:%s" % (track_artist, track_name))
search_results = sp.search(q, limit=50, market='US')
The query returns a JSON object with information on tracks that match the query. The program collects the track id of the first item with an identical match to the query.
The app contains two programs to maintain the database.
Program: statistics_gui.py. The app provides a Graphic User Interface (GUI) developed in PYsimpleGUI to report on the user listening activity. The user selects to conduct the query on either a specific year or the entire listening history. The program conducts the following queries:
Ten most listened songs in the requested period. Conducts the following query:
f""" SELECT strftime ('%Y', timestamp) as year, song, artist, history.id,
count(history.id) as n FROM history
INNER JOIN tracks on tracks.id = history.id
INNER JOIN artists ON tracks.main_artist_id = artists.artist_id
where year = {year} group by history.id order by n desc limit 10 """
The results are displayed in the GUI first column.
Most listened genres. Aggregates spotify defined genres conducting queries on the total duration of tracks listened. The following example for rock total time spent listening to rock tracks.
command = f"""SELECT strftime ('%Y', timestamp) as year, sum(duration)
FROM history INNER JOIN tracks on tracks.id = history.id
WHERE (main_genre LIKE '%rock%' OR main_genre = 'beatlesque' OR main_genre = 'alternative dance')
AND (year = {year})""
The results for each genre aggregate are divided by the total listening time of the period and a pie chart is generated using the matplotlib library. The graph is saved in png format and then displayed in the second column of the GUI.
command = f"""SELECT strftime ('%Y', timestamp) as year, artist, image_url, sum(duration) FROM history
INNER JOIN tracks on tracks.id = history.id
INNER JOIN tracks_artists on tracks.id = tracks_artists.id
INNER JOIN artists on artists.artist_id = tracks_artists.artist_id
WHERE year = {year}
GROUP BY artist
ORDER BY sum(duration)
DESC limit 5 """
The program fetches the url jpg file for each of the top artists, converts them to PNG and displays them in the bottom frame of the GUI.
Program: playlist_gui.py The cornerstone of the application suite is the ability to create dynamic playlists. Several playlist creators are available online, however I have not been able to find any that gives the user the ability to dynamically create playlists that take into consideration the play history. This playlist generator gives the user the ability to limit song repetition if she/he so desires.
The playlist generator is presented in a GUI developed in Tkinter. The user inputs the following fields:
Subsequently is presented with doubled entry sliders to choose the range of the following attributes:
Acknowledgement: The multiple slider widget is not Tkinter native. Developed by MengxunLi. https://github.com/MenxLi/tkSliderWidget
Based on the user inputs the program conducts the following query:
SELECT DISTINCT saved_tracks.id FROM saved_tracks
inner join tracks on saved_tracks.id = tracks.id
inner join tracks_artists on saved_tracks.id = tracks_artists.id
inner join genres on tracks_artists.artist_id = genres.artist_id
WHERE genre like ?
AND danceability BETWEEN ? AND ?
AND valence BETWEEN ? AND ?
AND energy BETWEEN ? AND ?
AND tempo BETWEEN ? AND ?
AND acousticness BETWEEN ? AND ?
AND instrumentalness BETWEEN ? AND ?
AND saved_tracks.id NOT IN
(SELECT DISTINCT id FROM history WHERE (SELECT JULIANDAY('now') - JULIANDAY(timestamp) < ?)) LIMIT ?""",
(play_string, dance_lower, dance_upper, valence_lower, valence_upper, energy_lower, energy_upper,
tempo_lower, tempo_upper, acousticness_lower, acousticness_upper,
instrumentalness_lower, instrumentalness_upper, days, limit), )
The results from the query are uploaded through the API to a Spotify playlist in the user's account.
A special word of thanks to the Harvard CS50 staff to put together such an amazing course. Cheers.