NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems
http://surpriselib.com
BSD 3-Clause "New" or "Revised" License
6.37k stars 1.01k forks source link

build_anti_testset() takes along time and at the end it doesnot work #451

Open AbdElrahmanMostafaRifaat1432 opened 1 year ago

AbdElrahmanMostafaRifaat1432 commented 1 year ago

1- reader = Reader(rating_scale=(1, 5)) 2- data = Dataset.load_from_df(ratings[['userId', 'asin', 'rating']], reader) # this is my own dataset 3 - svd = SVD(n_factors= 30 , n_epochs= 20 , lr_all = 0.005 , reg_all = 0.02 ) 4 - real_trainset = data.build_full_trainset() 5 - svd.fit(real_trainset) 6 -real_testset = real_trainset.build_anti_testset() # the code stop here after along time and at the end it returns memory error
7 -predictions = svd.test(real_testset) 8 - top_n = get_top_n(predictions, n=20)

When I run the program it stops at line number 6 because of (build_anti_testset()) and it returns memory error after along time

however when I replace (build_anti_testset()) with (build_testset()) it works and doesnot have any problem

but I need to use (build_anti_testset()) instead of (build_testset()) because I need the predictions to be on the items that the users has not rated yet

AbdElrahmanMostafaRifaat1432 commented 1 year ago

Capture this is my input data it may help you incase my input is not standard so the function cannot understand it if this is the case please tell me the solution

mohammadaminvali commented 1 year ago

1- reader = Reader(rating_scale=(1, 5)) 2- data = Dataset.load_from_df(ratings[['userId', 'asin', 'rating']], reader) # this is my own dataset 3 - svd = SVD(n_factors= 30 , n_epochs= 20 , lr_all = 0.005 , reg_all = 0.02 ) 4 - real_trainset = data.build_full_trainset() 5 - svd.fit(real_trainset) 6 -real_testset = real_trainset.build_anti_testset() # the code stop here after along time and at the end it returns memory error 7 -predictions = svd.test(real_testset) 8 - top_n = get_top_n(predictions, n=20)

When I run the program it stops at line number 6 because of (build_anti_testset()) and it returns memory error after along time

however when I replace (build_anti_testset()) with (build_testset()) it works and doesnot have any problem

but I need to use (build_anti_testset()) instead of (build_testset()) because I need the predictions to be on the items that the users has not rated yet

Dear @bodymostafa123

those two functions use very different amounts of memory.

build_testset() function transforms the trainset into a somehow raw format. If your trainset has x lines of ratings, resulted test set also has x lines of ratings.

build_anti_testset() uses much more memory. consider there are n users and m items, this function has (n * m) - x lines of ratings. HTH.