aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.09k stars 519 forks source link

Application in recommender system #230

Closed Tych0n closed 5 years ago

Tych0n commented 5 years ago

Hi, aksnzhy! Thank you for this library. Can you please guide me a bit? I have a dataset with four columns: transaction_count, user, item, item_colour. I want to recommend some items to users, based on transaction_count. I can use ALS with transaction_count, user and item columns, for example with "implicit" library. But if i want to take in account item_colour i need to use for example ffm. So, i create ffm-formatted file:

transaction_count user_id:value_id:1 item_id:value_id:1 item_colour_id:value_id:1

5 0:0:1 1:3:1 2:5:1
3 0:1:1 1:4:1 2:6:1
8 0:2:1 1:3:1 2:7:1

and train my model. But, if i want to recommend top-5 items with some colours to a user, i need to create all combinations of user:item:colour rows, score them and then sort among each user all variants of item:colour by modeled probabilities and select 5 best among them. The problem is that such a list of all possible combinations explodes with my dimensions (users=80000, items=14000, colours=5), and impossible to operate. Is there any hack for implementation?

aksnzhy commented 5 years ago

@Tych0n I'm not sure if I get your pointer correctly. You want to recommend top-5 items to users, but now xLearn can only do the binary classification task. One possible solution is that you can use one-vers-all to do a multi-class classification task, and then to compare the probability calculated by xLearn, and select the highest probability. It will only spend 5x cost compare to one binary classification task.

Tych0n commented 5 years ago

Yes, you got it right. I'm using 'task':'reg' and transaction_count as continious target, but I sure can scale it to binary one-vers-all form per user. The problem is, that to compare probabilities calculated by xLearn, i need to score 14000*5 variants per user. And do it 80000 times - it ends in creating dataset with 5.6 billion rows, and then scoring it with xLearn. I think may be i'm missing some point and doing it wrong.

aksnzhy commented 5 years ago

@Tych0n If you do this job like what you said, you actually facing a classification problem with 14000*5 labels. I think it is not a very good. If you can convert it to a binary task, e.g., given an item, predict the probability how you like it. Then you can predict all the probability of items for each user.

Tych0n commented 5 years ago

@aksnzhy got your point. Correct me please, if i'm wrong: i convert task to binary, score all user-item pairs to get probability of each pair, then sort top-5 items for each user. In this approach i still face the problem of creating 5.6 billion rows dataset to score and select from?

BrianMiner commented 5 years ago

I have the same exact question - do you have to create all possible permutations of items for each user and score all of them and order by probability?

BrianMiner commented 5 years ago

@Tych0n did you ever figure out a method?

Tych0n commented 5 years ago

@BrianMiner, no. I wasn't able to use this library in production for my problem, too many items to select recommendations from. I ended up using implicit.ALS. By the way, i tried LightFM - it showed comparable MAP@k, but still wasn't able to outperform ALS. At least in my setup.

BrianMiner commented 5 years ago

This must be a common issue for these types of models though. There must be some way to pre-filter the candidates. You see FM and FFM used for CTR problems. That could be as large as issue when you need to score so many ads placements versions (copy, color etc) for each user.

sumitsidana commented 4 years ago

Hi, you could use negative sampling to convert the problem from one class classification to binary classification. For every positive (transaction), sample a non-transaction. This way, you can still use this library or FM or FFM. Do sampling only during the training time. While prediction time, consider all the the items for prediction.