SforAiDl / genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL
https://genrl.readthedocs.io
MIT License
405 stars 58 forks source link

NN Based Contextual Bandit for large datasets #173

Closed threewisemonkeys-as closed 4 years ago

threewisemonkeys-as commented 4 years ago

Datasets we can use -

  1. This collection has datasets (like Bibtex) with a large label space (>100)
  2. Datasets that this paper tested on used datasets for much smaller (<10) label spaces.

Another difference between the two is in (2) they had to select a single label. Whereas for (1) each datapoint might have more than one label. So would we have to predict a very high dimnesional binary label vector.

The approach for (2) is training an NN to output parameters for a distribution over reward for each label which would be the logical extension of the simple tabular policies we implemented for categorial CBs.

For (1) this is intractable so we can implement other embedding based approaches.

So I think we should start with (2) first, what are your thoughts @Sharad24?

I am currently working on implementing a variation on the linear posterior method from 2

threewisemonkeys-as commented 4 years ago

Where should I add the bandit agents for this? In deep/agents/bandit or classical/bandit/contextual ?