DroptuneHQ / droptune-og

New music notifications for Spotify & Apple Music. Follow your favorite artists so you never miss a beat.
https://droptune.co
MIT License
93 stars 16 forks source link

Reduce Spotify API calls #53

Open Shpigford opened 6 years ago

Shpigford commented 6 years ago

A major bottleneck in our data processing is Spotify.

Checking for new music is very resource intensive as the only way (that I'm aware of) to do it is to loop through every single artist, then loop through every one of that artist's albums to see if any are new.

This means a single artist can generate dozens if not hundreds of individual jobs and calls to the Spotify API.

Even if we can't figure out a workaround with the Spotify API itself, maybe there's a clever way to decide when an artist actually needs updated.

i.e. An artist that hasn't released anything in 30 years has a relatively low chance of releasing something now...yet we still check them every. single. day.

Shpigford commented 6 years ago

Currently what I've got in mind...

This would be part of the BuildArtistJob that gets run each day.

https://github.com/Shpigford/droptune/blob/master/app/jobs/build_artist_job.rb

# Figure out the last date the artist released an album
last_release_date = artist.albums.order('release_date desc').first.release_date

# If that date is over X years, then set the interval for pinging Spotify...
case
  when last_release_date < 30.years.ago
    days = 30
  when last_release_date < 20.years.ago
    days = 20
  when last_release_date < 10.years.ago
    days = 10
  when last_release_date < 5.years.ago
    days = 5
  else
    days = 1
end

# Ping spotify only if `spotify_last_updated_at` is blank or if it's been more than the interval we set above
BuildArtistSpotifyJob.perform_async(artist_id) if artist.spotify_last_updated_at.blank? or artist.spotify_last_updated_at < days.day.ago
danielcompton commented 6 years ago

Are you using Conditional Requests? That would cut down on the work each request needs to do, and presumably would give faster responses for the requests you do need to make.

Also, https://developer.spotify.com/documentation/web-api/reference/artists/get-artists-albums/ looks like you can do one (or maybe multiple paginated) requests per artist. If the sort order was stable then you could do some tricks about requesting only the offset where you expect new albums to reside.

The header for that API says

Get Spotify catalog information about an artist’s albums. Optional parameters can be specified in the query string to filter and sort the response

But I didn’t see any sorting parameters (but I’m on mobile so might have missed something).

danielcompton commented 6 years ago

to loop through every single artist, then loop through every one of that artist's albums to see if any are new.

Are you saving which albums you have seen after each API call, or do you check if each album was released after the last time you updated? Saving seen albums in a database would save a lot of detail lookups for each album and would reduce down to a handful of queries per artist.

I’m not sure what scale of querying you’re doing but adding some jitter to the next check time would prevent thundering herds of rechecks every 24 hours (if you’re not doing that already).

pnomolos commented 6 years ago

What about using https://developer.spotify.com/documentation/web-api/reference/browse/get-list-new-releases/ (perhaps iterating over each of the available markets to make sure you get them all)?

Never mind: I see in Twitter comments that it’s manually curated.

Double-edit: you could start here as a way of not having to check a bunch of artists who have releases on this list.

P.P.P.S. Your less-than signs should be greater-than signs ;)

Shpigford commented 6 years ago

P.P.P.S. Your less-than signs should be greater-than signs ;)

@pnomolos < & > signs mixed with times and "days ago" gets freaking insane and none of it makes sense. At the moment what's above seems to work. ¯_(ツ)_/¯

Shpigford commented 6 years ago

Are you using Conditional Requests? That would cut down on the work each request needs to do, and presumably would give faster responses for the requests you do need to make.

@danielcompton Oooo, I hadn't seen Conditional Requests! Looking in to them now.

Are you saving which albums you have seen after each API call, or do you check if each album was released after the last time you updated?

We permanently save all of the data we get from Spotify, but the problem is that Spotify has very few mechanisms for filtering the API calls...it's sort of a "get it all or get nothing" type of thing.

I don't currently do any paginating of Spotify results as it is so not much to do in regards to reducing those types of calls. 😕

pnomolos commented 6 years ago

@shpigford

@pnomolos < & > signs mixed with times and "days ago" gets freaking insane and none of it makes sense. At the moment what's above seems to work. ¯(ツ)

Ah yes, you’re right. I was working relative to today in my head, instead of “closest to zero”. Thoughts on using new release list to remove artists from the list that needs to be checked for new releases?

Shpigford commented 6 years ago

@pnomolos

Thoughts on using new release list to remove artists from the list that needs to be checked for new releases?

Spotify's New Release list isn't thorough enough for it to make a dent. At best it'd remove maybe a few dozen or maybe 100 artists out of 100's of thousands.

pnomolos commented 6 years ago

@Shpigford It's outside of the current ecosystem you're using, but perhaps https://www.allmusic.com/newreleases/all can help? Looks like there's 500-600 artists there. I'd imagine you could step back say 6 months and assume anyone who's released an album in that time frame doesn't need to be checked for a new one. That should cull at least several thousand artists out once you have the initial check on them.

Shpigford commented 6 years ago

@pnomolos Problem in that scenario are singles. There are artists who put out singles every few weeks (especially leading up to a full release). 😕