fidelity / mab2rec

[AAAI 2024] Mab2Rec: Multi-Armed Bandits Recommender
https://fidelity.github.io/mab2rec/
129 stars 26 forks source link

Predict user to item estimated reward #16

Closed ayush488 closed 2 years ago

ayush488 commented 2 years ago

Is it possible to get the estimated reward for a user_id and item_id from any of the bandits?

bkleyn commented 2 years ago

Yes, the BanditRecommender has a predict_expections that can be used to return the expected rewards for any of the supported bandits.

The scores returned by the score pipeline function are transformed to be between 0 and 1 using the sigmoid function.

ayush488 commented 2 years ago

Got it... It is possible to speed up the predictions... Its takes quite a while to get predictions for even 10 contexts... Please suggest.

ayush488 commented 2 years ago

Also, why are some of the expected reward value coming as Nan for some of the arms?

ayush488 commented 2 years ago

@bkleyn ?

dorukkilitcioglu commented 2 years ago

Hi @ayush488, can you provide us with a minimal subset of your data that causes this issue? If there is a np.nan at some point within your data it might be causing this issue.

ayush488 commented 2 years ago

data has no Nans.. it is all binarized features... will send the data...

ayush488 commented 2 years ago

Also could you suggest about speed up?

ayush488 commented 2 years ago

here is the data that I am using.. item_ftrs.csv trainng_interactions.csv user_ftrs.csv

ayush488 commented 2 years ago
from mab2rec import BanditRecommender, LearningPolicy, NeighborhoodPolicy
recommenders = {
      "Random": BanditRecommender(learning_policy=LearningPolicy.Random()),
      "Popularity": BanditRecommender(learning_policy=LearningPolicy.Popularity()),
      "LinGreedy": BanditRecommender(learning_policy=LearningPolicy.LinGreedy(epsilon=0.1)),
      "LinUCB": BanditRecommender(learning_policy=LearningPolicy.LinUCB(alpha=10)),
      "LinTS": BanditRecommender(learning_policy=LearningPolicy.LinTS()),
      "ClustersTS": BanditRecommender(learning_policy=LearningPolicy.ThompsonSampling(), 
                                      neighborhood_policy=NeighborhoodPolicy.Clusters(n_clusters=10))
}
from jurity.recommenders import BinaryRecoMetrics, RankingRecoMetrics

# Column names for the response, user, and item id columns
metric_params = {'click_column': 'score', 'user_id_column': 'ID', 'item_id_column':'MailerID'}

# Evaluate peformance at different k-recommendations
top_k_list = [5,10,15]

# List of metrics to benchmark
metrics = []
for k in top_k_list:

  metrics.append(BinaryRecoMetrics.AUC(**metric_params, k=k))
  metrics.append(BinaryRecoMetrics.CTR(**metric_params, k=k))
  metrics.append(RankingRecoMetrics.Precision(**metric_params,  k=k))
  metrics.append(RankingRecoMetrics.Recall(**metric_params, k=k))
  metrics.append(RankingRecoMetrics.NDCG(**metric_params, k=k))
  metrics.append(RankingRecoMetrics.MAP(**metric_params, k=k))

from mab2rec.pipeline import benchmark

# Benchmark the set of recommenders for the list of metrics 
# using training data and user features scored on test data 
reco_to_results, reco_to_metrics = benchmark(recommenders, 
                                             metrics=metrics,
                                             train_data=df_train, 
                                           cv=5,
                                             user_features=df_users_X, 
                                             item_features=df_mailers_X,
                                             user_id_col= 'ID',
                                               item_id_col= 'MailerID',
                                               response_col = 'response',
                                               batch_size =10000,
                                            verbose=True) 

I am running the above pasted code on the data I provided... but it is erroring at LinGreedy...

LinGreedy
Running...
Traceback (most recent call last):

  File "C:\Users\ayush\Desktop\Rl\mabtest.py", line 393, in <module>
    verbose=True)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\mab2rec\pipeline.py", line 443, in benchmark
    recommendations, metrics = _bench(**args)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\mab2rec\pipeline.py", line 531, in _bench
    recommendations[name])

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\recommenders\combined.py", line 121, in get_score
    return_extended_results)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\recommenders\auc.py", line 140, in get_score
    return self._accumulate_and_return(results, batch_accumulate, return_extended_results)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\recommenders\base.py", line 121, in _accumulate_and_return
    cur_result = self._get_results([results])

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\recommenders\auc.py", line 146, in _get_results
    return roc_auc_score(results[:, 0], results[:, 1])

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\sklearn\metrics\_ranking.py", line 546, in roc_auc_score
    y_score = check_array(y_score, ensure_2d=False)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\sklearn\utils\validation.py", line 800, in check_array
    _assert_all_finite(array, allow_nan=force_all_finite == "allow-nan")

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\sklearn\utils\validation.py", line 116, in _assert_all_finite
    type_err, msg_dtype if msg_dtype is not None else X.dtype

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
ayush488 commented 2 years ago

i Checked none of my 3 inputs contain any Nan or Inf.

ayush488 commented 2 years ago

my responses are binary.

bkleyn commented 2 years ago

Unfortunately, I am not able to reproduce the error you are getting.

I did get another error since the training data you provided (trainng_interactions.csv) includes user IDs that do not occur in the user features (user_ftrs.csv). After subsetting the training data to only include users for which features are available, I was able to run the code below.

I did also notice that LinTS is quite slow as you mentioned elsewhere. Since LinTS requires the entire feature matrix to be inverted, I would not suggest using this algorithm when using hundreds of features.

See code and output below:

import pandas as pd

# Read data
df_train = pd.read_csv("trainng_interactions.csv")
df_users = pd.read_csv("user_ftrs.csv")
df_items = pd.read_csv("item_ftrs.csv")

# Subset train data to include only users with features
mask = df_train['ID'].isin(df_users['ID'])
df_train = df_train[mask]

from mab2rec import BanditRecommender, LearningPolicy, NeighborhoodPolicy
recommenders = {
      "Random": BanditRecommender(learning_policy=LearningPolicy.Random()),
      "Popularity": BanditRecommender(learning_policy=LearningPolicy.Popularity()),
      "LinGreedy": BanditRecommender(learning_policy=LearningPolicy.LinGreedy(epsilon=0.1)),
      "LinUCB": BanditRecommender(learning_policy=LearningPolicy.LinUCB(alpha=10)),
      #"LinTS": BanditRecommender(learning_policy=LearningPolicy.LinTS()),
      "ClustersTS": BanditRecommender(learning_policy=LearningPolicy.ThompsonSampling(), 
                                      neighborhood_policy=NeighborhoodPolicy.Clusters(n_clusters=10))
}
from jurity.recommenders import BinaryRecoMetrics, RankingRecoMetrics

# Column names for the response, user, and item id columns
metric_params = {'click_column': 'score', 'user_id_column': 'ID', 'item_id_column':'MailerID'}

# Evaluate peformance at different k-recommendations
top_k_list = [5,10,15]

# List of metrics to benchmark
metrics = []
for k in top_k_list:
    metrics.append(BinaryRecoMetrics.AUC(**metric_params, k=k))
    metrics.append(BinaryRecoMetrics.CTR(**metric_params, k=k))
    metrics.append(RankingRecoMetrics.Precision(**metric_params,  k=k))
    metrics.append(RankingRecoMetrics.Recall(**metric_params, k=k))
    metrics.append(RankingRecoMetrics.NDCG(**metric_params, k=k))
    metrics.append(RankingRecoMetrics.MAP(**metric_params, k=k))

from mab2rec.pipeline import benchmark

# Benchmark the set of recommenders for the list of metrics 
# using training data and user features scored on test data 
reco_to_results, reco_to_metrics = benchmark(recommenders, 
                                             metrics=metrics,
                                             train_data=df_train, 
                                             cv=5,
                                             user_features=df_users, 
                                             item_features=df_items,
                                             user_id_col='ID',
                                             item_id_col='MailerID',
                                             response_col='response',
                                             batch_size=10000,
                                             verbose=True) 

Output:

CV Fold = 1 

>>> Random
Running...
Done: 0.01 minutes 

>>> Popularity
Running...
Done: 0.01 minutes 

>>> LinGreedy
Running...
Done: 0.05 minutes 

>>> LinUCB
Running...
Done: 0.12 minutes 

>>> ClustersTS
Running...
Done: 0.01 minutes 

CV Fold = 2 

>>> Random
Running...
Done: 0.00 minutes 

>>> Popularity
Running...
Done: 0.01 minutes 

>>> LinGreedy
Running...
Done: 0.05 minutes 

>>> LinUCB
Running...
Done: 0.12 minutes 

>>> ClustersTS
Running...
Done: 0.01 minutes 

CV Fold = 3 

>>> Random
Running...
Done: 0.01 minutes 

>>> Popularity
Running...
Done: 0.01 minutes 

>>> LinGreedy
Running...
Done: 0.06 minutes 

>>> LinUCB
Running...
Done: 0.14 minutes 

>>> ClustersTS
Running...
Done: 0.01 minutes 

CV Fold = 4 

>>> Random
Running...
Done: 0.01 minutes 

>>> Popularity
Running...
Done: 0.01 minutes 

>>> LinGreedy
Running...
Done: 0.05 minutes 

>>> LinUCB
Running...
Done: 0.13 minutes 

>>> ClustersTS
Running...
Done: 0.02 minutes 

CV Fold = 5 

>>> Random
Running...
Done: 0.01 minutes 

>>> Popularity
Running...
Done: 0.00 minutes 

>>> LinGreedy
Running...
Done: 0.05 minutes 

>>> LinUCB
Running...
Done: 0.11 minutes 

>>> ClustersTS
Running...
Done: 0.01 minutes
ayush488 commented 2 years ago

I was not able to upload full data here... but with full data .. it is giving error after running for very long time for linepsilon.

dorukkilitcioglu commented 2 years ago

Is there a way you can try slicing your data in different ways (ex: dividing it into 5 parts and seeing if any one of those parts is causing the error)? It would be helpful to know whether there's a specific subset of the data that's causing you issues, or whether it's only happening with the full data. If there is a specific subset you can find, we can take a look at it and see whether we can reproduce the error on our side.

ayush488 commented 2 years ago

will try the slicing approach now..

ayush488 commented 2 years ago

I there were some userid in my interaction data that were not present in the user features.. so I removed them and running the code...

ayush488 commented 2 years ago

what is the way to speed up the benchmark().. even for the popularity bandit is is taking a lot of time...I have about 400k interactions and 370k users and 894 items..

bkleyn commented 2 years ago

For Popularity, the number of items (arms) is likely the main culprit. Generally, run-time for bandit algorithms will scale linearly based on the number of items.

ayush488 commented 2 years ago

What is the way I can do speed up??? Regards, Ayush

On Thu, Jul 14, 2022 at 11:12 AM Bernard Kleynhans @.***> wrote:

For Popularity, the number of items (arms) is likely the main culprit. Generally, run-time for bandit algorithms will scale linearly based on the number of items.

— Reply to this email directly, view it on GitHub https://github.com/fidelity/mab2rec/issues/16#issuecomment-1184632120, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDPIESI2CK6C2SYUW6WGFLVUA4ANANCNFSM53LXBUNA . You are receiving this because you were mentioned.Message ID: @.***>

ayush488 commented 2 years ago

I have uploaded my full data here: https://www.dropbox.com/s/n4ruxzf82my1at9/test_interactions.zip?dl=0 I am still getting Nan error in LinUCB.. LinGreedy ran fine. If possible can you see at your end

ayush488 commented 2 years ago

Also, why is the output (predicted reward) coming outside (0,1) when all my training data has only 0,1 responses. Many are coming negative and many well over 1. Can you please explain..

bkleyn commented 2 years ago

Also, why is the output (predicted reward) coming outside (0,1) when all my training data has only 0,1 responses. Many are coming negative and many well over 1. Can you please explain..

The range of predicted expectations will depend on the selected learning policies. For example, LinUCB uses a Ridge regression to estimate rewards as a linear combination of the user contexts, meaning expectations can be outside of [0, 1] range. Thompson Sampling on that other hand samples expectations from a beta distribution, which means expectations will all be between 0 and 1.

bkleyn commented 2 years ago

This thread contains discussion unrelated to original issue so I am closing it.