Error when attempting OPE on logged dataset

travisbrady commented 2 years ago

Hi, first thanks for coba! Very glad you've created all this and excited to see where it goes.

I am trying to evaluate vw against simpler policies for a CB problem evaluated against a dataset of previously logged uniformly random feedback data. All of this is intended to be 100% offline to see if a CB approach could be of use here.

Here is my error:

2022-09-01 12:46:47 -- Processing chunk...
2022-09-01 12:46:47 --   * Recording Learner 0 parameters... (0.0 seconds) (completed)
2022-09-01 12:46:47 --   * Recording Learner None parameters... (0.0 seconds) (exception)
2022-09-01 12:46:47 --   * Unexpected exception:

  File "/Users/tbrady/opt/anaconda3/envs/hk/lib/python3.10/site-packages/coba/experiments/process.py", line 185, in filter
    row = workitem.task.process(deepcopy(workitem.learner))

  TypeError: SimpleEnvironmentTask.process() missing 1 required positional argument: 'interactions'

2022-09-01 12:46:47 --   * Recording Learner 0 parameters... (0.0 seconds) (exception)
2022-09-01 12:46:47 --   * Unexpected exception:

  File "/Users/tbrady/opt/anaconda3/envs/hk/lib/python3.10/site-packages/coba/experiments/process.py", line 185, in filter
    row = workitem.task.process(deepcopy(workitem.learner))

  TypeError: OnlineOffPolicyEvalTask.process() missing 1 required positional argument: 'interactions'```

Code:

df = load_df()
environments = [df_to_environment(df)]
learners = [VowpalOffPolicyLearner(['cat1', 'cat2'])]
e = Experiment(environments, learners, evaluation_task=OnlineOffPolicyEvalTask())
e.run()



Coba version: 4.11.0
Python version: 3.10.6

Thanks for any help you can provide

mrucker commented 1 year ago

@travisbrady Hi Travis. I'm really sorry about the delay. The long weekend in the United States kept me away from this. I'm looking at this today and should have a response/resolution soon.

mrucker commented 1 year ago

@travisbrady I think I figured it out. I think df_to_environment(df) is returning a value of None. Or, at least, when I manually pass in [None] to Experiment I receive the same error message that you are receiving. I just released an updated version of Coba (4.11.1) that checks for this case and throws a more detailed error message further upstream. Please let me know if this addresses your issue.

travisbrady commented 1 year ago

Ahh you're right! Here is the code of that function, for some reason LoggedEnvironment.read is returning None as you say. I must be misunderstanding the usage of that function.

def df_to_environment(df):
    lst = []
    for i, row in enumerate(df.itertuples()):
        context = {'cat': row.category, 'cat2': row.category_2}
        li = LoggedInteraction(context,
                row.action_id,
                row.dollar_reward,
                1/11,
                ACTIONS)
        lst.append(li)
    lenv = LoggedEnvironment.read(lst)
    return lenv

mrucker commented 1 year ago

Oh I understand now... In your example here is what you'd want...

class MyLoggedEnvironmentFromDF(LoggedEnvironment):

    def read(self):
        for row in load_df().itertuples():
            context = {'cat': row.category, 'cat2': row.category_2}
            yield LoggedInteraction(context, row.action_id, row.dollar_reward, 1/11, ACTIONS)

environments = [MyLoggedEnvironmentFromDF()]
learners = [VowpalOffPolicyLearner(['cat1', 'cat2'])]
results = Experiment(environments, learners, evaluation_task=OnlineOffPolicyEvalTask()).run()

The read command is what the Experiment calls to "read" the environment's interactions. This makes it so that Experiment doesn't load an environment into memory (i.e., call read) until we are ready to evaluate on it. This also means when multiprocessing we don't have to marshal data to background processes since each process loads the data it needs itself.

Also, coba could make this simpler (e.g., creating a LoggedEnvironment that takes a list of interactions in its constructor much like you thought to do with read).

mrucker commented 1 year ago

Also, as a warning I don't think your use of VW will work like you want (we also don't document it super well on our end).

I think what you want is learners = [VowpalOffPolicyLearner(['x'])]. Internally the VW wrappers in coba place all context features into a namespace called 'x'. So the VW examples are constructed with feature namespaces like { 'x': {'cat': row.category, 'cat2': row.category_2}, 'a': action } if that helps? Feature namespaces in VW are very powerful but not a common feature in machine learning libraries so they can take a bit to get used to.

travisbrady commented 1 year ago

@mrucker ahh yes very good to know. I've used vw directly quite a bit via the Python bindings and the command line but I didn't know about the 'x' thing.

Here is what I'm trying to do:

I have uniformly random data with 11 possible actions logged by a production policy
I want to use OPE to see if a VW CB approach can beat that random baseline and a simpler TS Beta approach with a MAB-per-context setup where each context is defined by two fields (the obfuscated cat1 and cat2 above.
the reward here is basically whether we observe a click or not.
I want to interact multiple features including the shared features with those specified per action

Will this work if I call VowpalOffPolicyLearner(['x']) as above?

mrucker commented 1 year ago

Yeah that should work. To interact the shared features with the context features you'd want something like this:

VowpalOffPolicyLearner(['x','a','xa'])

That'll get you the shared features ('x'), the features for each action ('a') and the interaction of the shared and action features ('xa'). You can extrapolate from there (e.g., 'xx', 'aa', or 'xxa').

VowpalWabbit / coba

Error when attempting OPE on logged dataset #19