Closed maxpagels closed 7 years ago
Yes, this is definitely possible.
Since you are using the adf reduction, everything essentially reduces to a single model, with the only difference between actions being the features of the action. As a consequence, 'forgetting' is a question of parameters forgetting rather than models forgetting. If you want to parameters to forget, then you typically do something like --l1 1e-6 or --l2 1e-6
@JohnLangford John, is it possible to refer us to the appropriate article presenting ADF? I seem to have lost track of where to study it, as I wish to garner a deeper understanding of the inherent forgetting, and in general.
Okay, so it looks like Action Dependent Features is a very benign reduction to CSOAA with Label Dependent Features. Though I can't trace where the reduction is happening, it looks like a very direct one. And l1 normalization with a small value is just a way of avoiding overfitting, hopefully but not necessarily making the model forget the right features while still succeeding for the currently "alive" actions.
Right. (Sorry, I've been KOed by NIPS papers :-) ) Most of the reduction work is happening here: https://github.com/JohnLangford/vowpal_wabbit/blob/24233e238c3e957066014d85ce13cd3b67d996e5/vowpalwabbit/gen_cs_example.cc
-John
On Thu, May 17, 2018 at 11:58 AM, matanster notifications@github.com wrote:
Okay, so it looks like Action Dependent Features is a very benign reduction https://github.com/JohnLangford/vowpal_wabbit/blob/24233e238c3e957066014d85ce13cd3b67d996e5/vowpalwabbit/cb_adf.cc to CSOAA with Label Dependent Features. Though I can't trace where the reduction is happening, it looks like a very direct one. And l1 normalization with a small value is just a way of avoiding overfitting, hopefully but not necessarily making the model forget the right features while still succeeding for the currently "alive" actions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JohnLangford/vowpal_wabbit/issues/1306#issuecomment-389917861, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE25uUxnRoekHmUdHtb9iv4shHBMLj1ks5tzZ4TgaJpZM4OtYz3 .
@maxpagels Pardon me if I'm not getting it from the conversation. How did you finally address you questions of formatting for taking out a no longer eligible action?
I'm using VW in daemon mode with cb_explore_adf in order to implement a contextual bandit system that optimises the click-through rate of adverts. The documentation states that the ADF learning mode is good for applications in which the set of actions can change over time. In my application, this is exactly the case — an advert constitutes one action/arm. The set of adverts can increase, and, more importantly, decrease in size over time.
Based on the docs, actions are implicitly identified based on line number in the input. If so, at prediction time, is it at all possible to get predictions for a specified subset of actions, or do I always have to input all of the actions seen since timestep 0?
Example: say I have two adverts, A and B, with implicit line numbers 0 and 1. I can get predictions like so:
Now, let’s say that another ad C has been published and ad A has been unpublished. Is it possible to get predictions for B and C without (unnecessarily) passing a line for ad B? If the action IDs are implicit and based on line numbers, the only way I can think of dealing with this situation is:
Another question: as the set of actions changes over time, what will happen to the memory required by the underlying learner? Is it possible to remove old/irrelevant actions/arms (B in the example above), or will there always be some trace of them in the learner?
Thanks!