Open azzelena opened 3 years ago
You want exclude_known = True
not exclude_known== True
. You only want one equal sign not two.
Sorry, there is one = , not two. But it is doesn't work anyway :(
I can not reproduce this issue. Excluding known seems to work just fine when using user side information.
The following code:
import turicreate
sf = turicreate.SFrame({
'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"],
'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"],
'rating': [1, 3, 2, 5, 4, 1, 4, 3]
})
user_info = turicreate.SFrame({
'user_id': ["0", "1", "2"],
'numeric_feature': [0.1, 12, 22]
})
m = turicreate.factorization_recommender.create(
sf, target='rating', user_data=user_info)
print(m.recommend([0,1,2], exclude_known=True))
prints:
+---------+---------+--------------------+------+
| user_id | item_id | score | rank |
+---------+---------+--------------------+------+
| 0 | d | 1.7338812722585795 | 1 |
| 1 | c | 3.8376356229560438 | 1 |
| 1 | d | 3.0225044876711427 | 2 |
| 2 | a | 2.613139470909195 | 1 |
+---------+---------+--------------------+------+
[4 rows x 4 columns]
None of these (user_id
, item_id
) pairs are in sf
.
That's right. But if we add user_ids that are not in sf
, it breaks.
import turicreate as tc
sf = tc.SFrame({
'person_id': ["10055", "10055", "10055","200","200"],
'product_id': ["a", "b", "c","y","u"],
})
item_data = tc.SFrame({
'product_id': ["a", "b","g"],
'category': ["10", "20","j"],
})
user_info = tc.SFrame({
'person_id': ["1", "2", "10055", "200"],
'fav_category': ["sample", "sample3", "sample", "sample"],
})
model = tc.recommender.ranking_factorization_recommender.create(sf,
user_id="person_id", random_seed=None,
item_id="product_id",
max_iterations=120,solver='adagrad',
item_data=item_data, user_data=user_info,
verbose=True)
recommended = model.recommend(sf["person_id"].unique(), k=3, exclude_known=True)
print(recommended)
print(sf)
prints:
+-----------+------------+-----------------------+------+
| person_id | product_id | score | rank |
+-----------+------------+-----------------------+------+
| 200 | y | 0.9996631073397743 | 1 |
| 200 | u | 0.9993447021791567 | 2 |
| 200 | g | 0.0022382634943607206 | 3 |
| 10055 | a | 0.9997131845839977 | 1 |
| 10055 | b | 0.9997102391956959 | 2 |
| 10055 | c | 0.999197361763928 | 3 |
+-----------+------------+-----------------------+------+
[6 rows x 4 columns]
+-----------+------------+
| person_id | product_id |
+-----------+------------+
| 10055 | a |
| 10055 | b |
| 10055 | c |
| 200 | y |
| 200 | u |
+-----------+------------+
You're right. Having user data for users that are not present in the observation data does break exclude_known=True
.
I've verified your results and also verified that removing the first two rows of user_info
causes things to work as expected.
@hoytak - This is very strange. Any idea what's going on here?
I guess the workaround here is simple. Run:
user_info = user_info.filter_by(sf["person_id"].unique(), 'person_id')
before calling tc.recommender.ranking_factorization_recommender.create
.
When I add user_data in
turicreate.recommender.ranking_factorization_recommender.create
(side information for the user) then the parameterexclude_known== True
(exclude all user-item interactions previously seen in the training data) inturicreate.recommender.factorization_recommender.FactorizationRecommender.recommend
doesn't work ( products that were in the training data are predicted).what could be the problem?