VowpalWabbit / coba

Contextual bandit benchmarking
https://coba-docs.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
49 stars 19 forks source link

Error when attempting OPE on logged dataset #19

Closed travisbrady closed 2 years ago

travisbrady commented 2 years ago

Hi, first thanks for coba! Very glad you've created all this and excited to see where it goes.

I am trying to evaluate vw against simpler policies for a CB problem evaluated against a dataset of previously logged uniformly random feedback data. All of this is intended to be 100% offline to see if a CB approach could be of use here.

Here is my error:

2022-09-01 12:46:47 -- Processing chunk...
2022-09-01 12:46:47 --   * Recording Learner 0 parameters... (0.0 seconds) (completed)
2022-09-01 12:46:47 --   * Recording Learner None parameters... (0.0 seconds) (exception)
2022-09-01 12:46:47 --   * Unexpected exception:

  File "/Users/tbrady/opt/anaconda3/envs/hk/lib/python3.10/site-packages/coba/experiments/process.py", line 185, in filter
    row = workitem.task.process(deepcopy(workitem.learner))

  TypeError: SimpleEnvironmentTask.process() missing 1 required positional argument: 'interactions'

2022-09-01 12:46:47 --   * Recording Learner 0 parameters... (0.0 seconds) (exception)
2022-09-01 12:46:47 --   * Unexpected exception:

  File "/Users/tbrady/opt/anaconda3/envs/hk/lib/python3.10/site-packages/coba/experiments/process.py", line 185, in filter
    row = workitem.task.process(deepcopy(workitem.learner))

  TypeError: OnlineOffPolicyEvalTask.process() missing 1 required positional argument: 'interactions'```

Code:
df = load_df()
environments = [df_to_environment(df)]
learners = [VowpalOffPolicyLearner(['cat1', 'cat2'])]
e = Experiment(environments, learners, evaluation_task=OnlineOffPolicyEvalTask())
e.run()


Coba version: 4.11.0
Python version: 3.10.6

Thanks for any help you can provide
mrucker commented 2 years ago

@travisbrady Hi Travis. I'm really sorry about the delay. The long weekend in the United States kept me away from this. I'm looking at this today and should have a response/resolution soon.

mrucker commented 2 years ago

@travisbrady I think I figured it out. I think df_to_environment(df) is returning a value of None. Or, at least, when I manually pass in [None] to Experiment I receive the same error message that you are receiving. I just released an updated version of Coba (4.11.1) that checks for this case and throws a more detailed error message further upstream. Please let me know if this addresses your issue.

travisbrady commented 2 years ago

Ahh you're right! Here is the code of that function, for some reason LoggedEnvironment.read is returning None as you say. I must be misunderstanding the usage of that function.

def df_to_environment(df):
    lst = []
    for i, row in enumerate(df.itertuples()):
        context = {'cat': row.category, 'cat2': row.category_2}
        li = LoggedInteraction(context,
                row.action_id,
                row.dollar_reward,
                1/11,
                ACTIONS)
        lst.append(li)
    lenv = LoggedEnvironment.read(lst)
    return lenv
mrucker commented 2 years ago

Oh I understand now... In your example here is what you'd want...

class MyLoggedEnvironmentFromDF(LoggedEnvironment):

    def read(self):
        for row in load_df().itertuples():
            context = {'cat': row.category, 'cat2': row.category_2}
            yield LoggedInteraction(context, row.action_id, row.dollar_reward, 1/11, ACTIONS)

environments = [MyLoggedEnvironmentFromDF()]
learners = [VowpalOffPolicyLearner(['cat1', 'cat2'])]
results = Experiment(environments, learners, evaluation_task=OnlineOffPolicyEvalTask()).run()

The read command is what the Experiment calls to "read" the environment's interactions. This makes it so that Experiment doesn't load an environment into memory (i.e., call read) until we are ready to evaluate on it. This also means when multiprocessing we don't have to marshal data to background processes since each process loads the data it needs itself.

Also, coba could make this simpler (e.g., creating a LoggedEnvironment that takes a list of interactions in its constructor much like you thought to do with read).

mrucker commented 2 years ago

Also, as a warning I don't think your use of VW will work like you want (we also don't document it super well on our end).

I think what you want is learners = [VowpalOffPolicyLearner(['x'])]. Internally the VW wrappers in coba place all context features into a namespace called 'x'. So the VW examples are constructed with feature namespaces like { 'x': {'cat': row.category, 'cat2': row.category_2}, 'a': action } if that helps? Feature namespaces in VW are very powerful but not a common feature in machine learning libraries so they can take a bit to get used to.

travisbrady commented 2 years ago

@mrucker ahh yes very good to know. I've used vw directly quite a bit via the Python bindings and the command line but I didn't know about the 'x' thing.

Here is what I'm trying to do:

Will this work if I call VowpalOffPolicyLearner(['x']) as above?

mrucker commented 2 years ago

Yeah that should work. To interact the shared features with the context features you'd want something like this:

VowpalOffPolicyLearner(['x','a','xa'])

That'll get you the shared features ('x'), the features for each action ('a') and the interaction of the shared and action features ('xa'). You can extrapolate from there (e.g., 'xx', 'aa', or 'xxa').