Harrison97 / spotipy-plamere

A light weight Python library for the Spotify Web API
http://spotipy.readthedocs.org/
MIT License
23 stars 3 forks source link

The Spotify object and memory usage #6

Open SHxKM opened 4 years ago

SHxKM commented 4 years ago

Given that this is now the maintained version, I’m posting this question here. Thanks @Harrison97 for taking the initiative.

I’m pretty certain that I am at least somehow misusing this library. The reason I think so is that when I need to process especially large Spotify libraries, my memory usage spikes dramatically. Notice that I process libraries of the same size for Apple Music, and I don’t get the same issue.

So, basically what I’m doing is authenticating the user with a refresh token so I can do background refresh tasks on their behalf. After successfully authenticating and getting the tokens I’m using spotipy to authenticate and get the spotipy object. I then pass this object around through multiple functions that wrap spotipy functionality: scan artists, libraries, pagination, etc...

Is this the correct way to do stuff? Again, I’m doing basically the exact same thing with (my own) Apple Music parser and the difference in memory usage is dramatic. Am I missing something?

deeplusplus commented 4 years ago

In general it makes sense to create a single Spotipy client object and pass it around for use by various functions/collaborators. But it's hard to say without knowing exactly what your current implementation is, what behavior you are observing, and what your desired/expected behavior is.

SHxKM commented 4 years ago

@deeplusplus Thanks for your help. I'll try to be more specific.

what your current implementation is

  1. Authorize the user
def authorize_user(user):
    ...
    spotify = spotipy.Spotify(auth=new_user_token)
        return spotify

Then, using the spotify object we got, get all artists in his library (including artists appearing in tracks), example:

def get_spotify_followed(spotify_object):
    d = spotify_object.current_user_followed_artists(limit=50)
    all_res = paginate_spotify(spotify_object, d, chosen_key="artists")
    final_list = []
    for item in all_res["items"]:
        final_list.append({"name": item["name"], ....})

what behavior you are observing

For libraries with 1,000 artists and above, I'm seeing a steep increase in memory for my celery process. I'm talking about a close to 100MB increase per a single library scan. I'm aware that we are traversing 20 (1000/50) JSON responses and possible over 1,000 dicts, but my own implementation of the Apple Music API chews through libraries as large as 9,000 artists without such a huge memory footprint.

and what your desired/expected behavior is.

With the current memory usage pattern, I won't be able to scale efficiently. Hence why I posted here, possibly for advice.

deeplusplus commented 4 years ago

Hmmm... It's still hard to say specifically what's going wrong. From a high level I wonder about a two things.

stephanebruckert commented 4 years ago

Possible fix https://github.com/plamere/spotipy/pull/269