Open amohar2 opened 1 year ago
Thanks for the report @amohar2 ,
I think you're right, looks like there are lots of unnecessary allocations.
for u in range(trainset.n_users):
# Might as well allocate the max size once and for all
# instead of allocating the exact size each time
max_Iu_length = max(max_Iu_length, len(trainset.ur[u]))
Iu = <int *>malloc(max_Iu_length * sizeof(int))
should probably be instead
for u in range(trainset.n_users):
# Might as well allocate the max size once and for all
# instead of allocating the exact size each time
max_Iu_length = max(max_Iu_length, len(trainset.ur[u]))
Iu = <int *>malloc(max_Iu_length * sizeof(int))
Description
In my experiments, I have observed that multiple calls to SVDpp's fit() function would steadily increase the memory usage of the Python process that is running it. To investigate this, I looked at matrix_factorization.pyx module for SVDpp's fit() call and seems that there is a possible memory leakage, where several "Iu" arrays are allocated using malloc for all users, however, at the end of fit(), the free() function is only called once. Reference: https://github.com/NicolasHug/Surprise/blob/master/surprise/prediction_algorithms/matrix_factorization.pyx#L453-L458
https://github.com/NicolasHug/Surprise/blob/master/surprise/prediction_algorithms/matrix_factorization.pyx#L508-L514
This seems to be the case only when cache_ratings is False.
Steps/Code to Reproduce
This is part of a larger experiment that I am running so currently I do not have a simple reproducible example.
Versions
1.1.3