VowpalWabbit / coba

Contextual bandit benchmarking
https://coba-docs.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
48 stars 19 forks source link

remove ope_loss from out update #37

Closed jonastim closed 1 year ago

jonastim commented 1 year ago

I came across this issue in the synthetic data generation online/offline learner notebook when the random learner would have ope_loss values in its interaction results (even though it should be NaN).

Stepping through the code it showed the interaction dict had ope_loss set as a field and before the introduced change the ope_loss value would be overridden in out.

Screenshot 2023-03-13 at 8 41 31 AM

This code change fixes the issue but I think ope_loss shouldn't be set in the interaction dict in the first place and I am not quite sure where it comes from. The whole logic of popping items out of the interaction and then using them in the end to update the output also seems a bit brittle.

Raising this as a draft first to discuss the best path forward, @mrucker

mrucker commented 1 year ago

Agreed with all above. SimpleEvaluation is 100% in need of some deep refactor love.

The strange behavior you're seeing is the intersection of two historical design decisions:

  1. Coba allows additional fields to be set on interactions and they get passed all the way through. This behavior exists because it is useful for experiments. For example, imagine interactions come from specific users and we'd like to know how well we're doing by user. To do this we can put a "UserId" field onto our environment's interactions and it'll be passed saved in the Result.interactions pandas dataframe at the end. Using this it would be possible to calculates stats on a user basis.

  2. Popping from interaction was done in an effort to make SimpleEvaluation fast. I've used coba to run experiments with 200 environments that had 10,000 interactions which I ran with 50 different shuffle seeds. At that scale I'm currently spending almost a full hour doing nothing but running SimpleEvaluation code. I just did some benchmarking and it looks like this code change doesn't impact performance. There are a whole bunch "benchmarking" unit tests in test_performance.py, including on for SimpleEvaluation on line 414.

If I had to guess, I imagine what you are seeing is the initial interactions dataframe you are creating has 'ope_loss'. Then, when loading environments from that dataframe the 'ope_loss' column is being added to each interaction.

mrucker commented 1 year ago

I'm happy with taking this as is and leaving a bigger refactor of SimpleEvaluation for later. Alternatively, if you want to take a stab at refactoring SimpleEvaluation you completely have my support. I simply ask that if you want to refactor that you make sure to keep the two items I mentioned above still true (i.e., additional fields passing through as is and runtime performance at least at the current level).

jonastim commented 1 year ago

Thanks for the context! Then let's stick with this bugfix for now as I am back on-call from tomorrow 😅