NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems
http://surpriselib.com
BSD 3-Clause "New" or "Revised" License
6.28k stars 1k forks source link

Possible memory leakage in SVDpp #463

Open amohar2 opened 1 year ago

amohar2 commented 1 year ago

Description

In my experiments, I have observed that multiple calls to SVDpp's fit() function would steadily increase the memory usage of the Python process that is running it. To investigate this, I looked at matrix_factorization.pyx module for SVDpp's fit() call and seems that there is a possible memory leakage, where several "Iu" arrays are allocated using malloc for all users, however, at the end of fit(), the free() function is only called once. Reference: https://github.com/NicolasHug/Surprise/blob/master/surprise/prediction_algorithms/matrix_factorization.pyx#L453-L458

https://github.com/NicolasHug/Surprise/blob/master/surprise/prediction_algorithms/matrix_factorization.pyx#L508-L514

This seems to be the case only when cache_ratings is False.

Steps/Code to Reproduce

This is part of a larger experiment that I am running so currently I do not have a simple reproducible example.

Versions

1.1.3

NicolasHug commented 1 year ago

Thanks for the report @amohar2 ,

I think you're right, looks like there are lots of unnecessary allocations.

            for u in range(trainset.n_users):
                # Might as well allocate the max size once and for all
                # instead of allocating the exact size each time
                max_Iu_length = max(max_Iu_length, len(trainset.ur[u]))
                Iu = <int *>malloc(max_Iu_length * sizeof(int))

should probably be instead

            for u in range(trainset.n_users):
                # Might as well allocate the max size once and for all
                # instead of allocating the exact size each time
                max_Iu_length = max(max_Iu_length, len(trainset.ur[u]))
            Iu = <int *>malloc(max_Iu_length * sizeof(int))