VowpalWabbit / coba

Contextual bandit benchmarking
https://coba-docs.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
48 stars 19 forks source link

rewards mapping fix + test #36

Closed jonastim closed 1 year ago

jonastim commented 1 year ago

The latest change to the rewards recording mistakenly mapped the reward eval function to the actions rather than their indices.

mrucker commented 1 year ago

Oh! That's it exactly! Man, I hate that we have to do it this way.

This also needs to handle batching. I can take care of adding that tomorrow.

mrucker commented 1 year ago

Batching added. Again, great catch and thanks for the unit test!

This is exactly how coba got so many tests to begin with. I've just been adding new tests every time I find a bug :).

If your team needs it I released this tonight as a new minor version update (6.3.0) because my batching code quasi-broke backwards compatibility.

Also, not that it matters to you but it looks like a research team at Stanford is using Coba now so that's fun...

Also also, the SimpleEvaluation class is getting kind of messy at this point. A long time ago what is now SimpleEvaluation actually used to be three separate classes. The three classes was hard for new users because they would struggle to figure out which evaluation type to use. So, I folded them all into SimpleEvaluation where we try to guess what to do based on what we're given. Maybe it is time to start think about separating again... Or at the very least refactoring to clean up a little...

jonastim commented 1 year ago

Great, thanks for the quick turnaround!

It's not super urgent to get the bugfix into a release as it only affects recording rewards which I am not relying on in any evaluations.

Great news about the Stanford lab! I am also looking forward to getting this into the hands of our data scientists and see what they come up with.

Yea, I can see the SimpleEvaluation class growing to become unwieldy. A simple first step could be to modularize the process function a bit into predict, learn and boilerplate components.