Open pallavi080596 opened 1 year ago
Hi,
shared |User user=Anna time_of_day=monthly gender=female age=3 |clicked_cats clicked_cats_1=L1 clicked_cats_2=L4 |recent_cats recent_cats_1=L2 recent_cats_2=L4
|Action category=L1
0:-0.3:0.19765689674531078 |Action category=L2
|Action category=L2
|Action category=L3
|Action category=L4
|Action category=L5
shared |User user=Tom time_of_day=weekly gender=male age=2 |clicked_cats clicked_cats_1=L2 clicked_cats_2=L3 |recent_cats recent_cats_1=L1 recent_cats_2=L4 |Action category=L1 ...
Otherwise seems correct.
2. Seems like adding cA and rA (clicked_cats * Actions and recent_cats * Actions) interactions should be useful here ("-q UA cA rA")
Hi @ataymano, Thank you for the response. Also, can you guide me here on which kind of algorithm works best for this above use case (softmax, RND, epsilon-greedy)? I need ranking for the categories(can consider probabilities from the model) on the basis of recent categories & clicked categories affinity to the particular user. (My reward function will depend on these two).
I am currently using the Vowpal Wabbit package in order to implement a Contextual Bandit use case. My use case is to provide categories(L1/L2/L3/L4/L5) considered action here with personalized ranking to the user on the basis of context like:
I have simulated a cost function and learned online on the basis of cost and action chosen using --cb_explore_adf -q UA param.
Sample Dataset:
My question here is: