Closed ghost closed 7 years ago
Hi, In my code, I assume both user/item index starts from 0 and looks like your id starts from 1. Shift id by 1 should work. like this:
ratings[:, (0, 1)] -= 1
Hey
really thanks, I really appreciate the help.
I have got one more question. The result of the factorization is a vector with the same size of the initial voting (my initial sparse matrix was 160x3 and the three dimensions are users-items-votings). The returned value (bpmf.predict(ratings)) return a vector sized 160x1. So it returns the approximated initial values. How can I see what is happening for the rest of the values. The total number of users are 31 and total number of items 9 so there are 279 possible votings. How can I see the approximated values of all possible votings?
Kind regards and really thanks again,
Christos
2017-02-16 4:35 GMT+01:00 Chyi-Kwei Yau notifications@github.com:
Hi, In my code, I assume both user/item index starts from 0 and looks like your id starts from 1. Shift id by 1 should work. like this:
ratings[:, (0, 1)] -= 1
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chyikwei/recommend/issues/8#issuecomment-280221679, or mute the thread https://github.com/notifications/unsubscribe-auth/AFlDT9qeZrQrFA_lFmEDpYzGd6TJlF6lks5rc8PsgaJpZM4MASaW .
Hi, You can list all pairs you want to predict.
For example, if you want to predict user_id = 10
with all items, you can do:
>>> user_id = 10
>>> n_item = 9
>>> ratings = np.stack((np.repeat(user_id, n_item), np.arange(n_item)), axis=1)
>>> ratings
array([[10, 0],
[10, 1],
[10, 2],
[10, 3],
[10, 4],
[10, 5],
[10, 6],
[10, 7],
[10, 8]])
>>> bpmf.predict(ratings)
I let eval_iters = 50, I encountered a problem(RuntimeWarning: overflow encountered in multiply). With the number of eval_iters' increase, I only want to have the minimized RMSE. But I don't know how to solve the problem. Here is the question: recommend-0.1.0-py2.7.egg/recommend/pmf.py:86: RuntimeWarning: overflow encountered in multiply recommend-0.1.0-py2.7.egg/recommend/pmf.py:88: RuntimeWarning: overflow encountered in multiply recommend-0.1.0-py2.7.egg/recommend/pmf.py:89: RuntimeWarning: overflow encountered in multiply recommend-0.1.0-py2.7.egg/recommend/pmf.py:90: RuntimeWarning: overflow encountered in multiply recommend-0.1.0-py2.7.egg/recommend/pmf.py:97: RuntimeWarning: invalid value encountered in add recommend-0.1.0-py2.7.egg/recommend/pmf.py:104: RuntimeWarning: invalid value encountered in add recommend-0.1.0-py2.7.egg/recommend/pmf.py:133: RuntimeWarning: invalid value encountered in greater site-packages/recommend-0.1.0-py2.7.egg/recommend/pmf.py:136: RuntimeWarning: invalid value encountered in less INFO: iter: 24, train RMSE: nan INFO: iter: 25, train RMSE: nan INFO: iter: 26, train RMSE: nan INFO: iter: 27, train RMSE: nan ...
Close this issue since #9 is created.
Really thanks for the toolbox, its easy to use and super nice. I have a question though. Is it easy to explain how exactly the predict functionality works, for the validation samples (both for BMF and PMF)? I mean you split your data into train and validation. When I have a small train_size (for example 5) in comparison to the validation set, then the RMSE for the validation set is small, Does it make sense?
Hi,
To predict rating for user i
and item j
, it is simply user_features[i] * item_features[j] + mean_rating
.(user_features
and item_features
are the latent variables we learned during training.) (source)
And when the train size is small, RMSE for validation set should be large though.
I am trying to use your example. I set train_pct = 0.001 and n_feature = 30 and then I got the following results: after 10 iteration, train RMSE: 1.120622, validation RMSE: 1.148221. Shouldnt the validation RMSE be higher?
I changed the pmf example with train_pct = 0.001
and n_feature = 30
and got this result:
n_user: 6040, n_item: 3952, n_feature: 30, training size: 1000, validation size: 999209
INFO: iter: 0, train RMSE: 1.117883
INFO: iter: 1, train RMSE: 1.097334
INFO: iter: 2, train RMSE: 1.070012
INFO: iter: 3, train RMSE: 1.034619
INFO: iter: 4, train RMSE: 0.987241
INFO: iter: 5, train RMSE: 0.923363
INFO: iter: 6, train RMSE: 0.839857
INFO: iter: 7, train RMSE: 0.733179
INFO: iter: 8, train RMSE: 0.604912
INFO: iter: 9, train RMSE: 0.474990
after 10 iterations, train RMSE: 0.474990, validation RMSE: 1.113232
Validation RMSE is much higher than training RMSE.
For bpmf, I can get similar result by increasing beta
related parameters.
I use beta=10., beta_user=10., beta_item=10.,
in the example, and get:
n_user: 6040, n_item: 3952, n_feature: 30, training size: 1000, validation size: 999209
INFO: iter: 0, train RMSE: 1.152316
INFO: iter: 1, train RMSE: 1.142085
INFO: iter: 2, train RMSE: 1.120228
INFO: iter: 3, train RMSE: 1.088892
INFO: iter: 4, train RMSE: 1.064241
INFO: iter: 5, train RMSE: 1.028266
INFO: iter: 6, train RMSE: 0.968250
INFO: iter: 7, train RMSE: 0.890073
INFO: iter: 8, train RMSE: 0.772950
INFO: iter: 9, train RMSE: 0.654719
after 10 iteration, train RMSE: 0.654719, validation RMSE: 1.253363
Even when I have set train_pct = 0.00001 so the train size is ten then the RMSE for the validation is 1.310131. Seems somehow that the RMSE is approximately is always the same.
Why you think RMSE should be higher? 1.31
is very large considering the rating values are between 1
and 5
.
If you use 3.0
to predict every data point in the dataset, you only get 1.259
.
>>> RMSE(np.repeat(3.0, 1000209), ratings[:, 2])
1.2594181530018158
My main issue is the fact that I performed NMF using the sklearn implementation and then BMF and the results are way better with the BMF. Therefore I am trying to see if something is not working properly here. Thanks for the help and the information anyway.
Do you check max/min value for your NMF prediction? In BPMF, I set min/max rating in predict function. you might need to do the same thing in NMF before comparing the results.
close this now
I have got a 31x9 matrix and I want to perform bmf through your code. Firstly, I read the matrix in the sparse format (180x3) as in the case of your example. Then, I calculate the max of the first and second col and trying to perform your code:
And I am receiving the following message: raise ValueError("max user_id >= %d", n_user) ValueError: ('max user_id >= %d', 31) What am I doing wrong? Actually it is working if I put n_user = 32 and n_item = 10. But does that make any sense? Furthermore the results of the bpmf.predict(ratings) are just the approximated values in my initial resutls. What about the rest of the values?