NN Based Contextual Bandit for large datasets

Datasets we can use -

This collection has datasets (like Bibtex) with a large label space (>100)
Datasets that this paper tested on used datasets for much smaller (<10) label spaces.

Another difference between the two is in (2) they had to select a single label. Whereas for (1) each datapoint might have more than one label. So would we have to predict a very high dimnesional binary label vector.

The approach for (2) is training an NN to output parameters for a distribution over reward for each label which would be the logical extension of the simple tabular policies we implemented for categorial CBs.

For (1) this is intractable so we can implement other embedding based approaches.

So I think we should start with (2) first, what are your thoughts @Sharad24?

I am currently working on implementing a variation on the linear posterior method from 2

SforAiDl / genrl

NN Based Contextual Bandit for large datasets #173