VowpalWabbit / coba

Contextual bandit benchmarking
https://coba-docs.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
48 stars 19 forks source link

Named feature support for VW buggy #22

Closed jonastim closed 1 year ago

jonastim commented 1 year ago

Hi! When passing context features as a dictionary with the keys being their names they don't seem to be processed properly. While stepping through the code I believe the issue is in _prep_namespaces logic

for k,v in compress(zip(K,V),V):
    if v.__class__ is str:
        d[f"{k}={v}"] = 1
    else:
        d[k] = v

As an example I created this simple custom environment in which the reward is influenced by one of three features:

class CustomEnvironment(LambdaSimulation):
    def __init__(self, n_interactions: Optional[int] = 1000):
        super().__init__(n_interactions, self.context, self.actions, self.rewards)
        self.r = CobaRandom(1)

    def actions(self, index: int, context: Context) -> Sequence[Action]:
        """
        actions: A function that should return all valid actions for a given index, context and random state.
        """
        return [0,1]

    def context(self, index: int) -> Context:
        # return tuple([self.r.randoms(1)[0], self.r.randoms(1)[0], self.r.randoms(1)[0]])
        return {
            "feature_1": self.r.randoms(1)[0],
            "feature_2": self.r.randoms(1)[0],
            "feature_3": self.r.randoms(1)[0]
        }

    def rewards(self, index: int, context: Context, action: Action) -> float:
        # reward_probabilities_for_actions = [
        #     0.5 + 1.0 * (context[0] - 0.5),
        #     0.5
        # ]
        reward_probabilities_for_actions = [
            0.5 + 1.0 * (context["feature_1"] - 0.5),
            0.5
        ]
        return np.random.binomial(1, reward_probabilities_for_actions[action])

environments = Environments([CustomEnvironment(5000)]).shuffle([1,2,3,4,5,6,7,8])
learners     = [
    VowpalBagLearner(features=[1, 'x', 'a', 'ax']),
    VowpalSoftmaxLearner(features=[1, 'x', 'a', 'ax']),
    LinUCBLearner(features=[1, 'x', 'a', 'ax']),
    RandomLearner(),
]
result = Experiment(environments, learners).run()

When passing the features as a tuple (commented out code) the results look reasonable (albeit not great)

tuple_feature

but when passing the features as a dict the performance is random.

named_feature

The issue seems to be that the value of each numerical feature is set to 1

Screenshot 2023-01-11 at 1 43 46 PM

because of this logic (intended for categorical features?)

Screenshot 2023-01-11 at 1 42 16 PM

Can you confirm that this is a bug and not a misunderstanding on my side?

Thanks, Jonas

mrucker commented 1 year ago

You're absolutely correct. Nice sleuthing and debugging. What I can't seem to figure out is why this is happening for you.

I ran your code on my machine and I didn't have any problems:

image

Unfortunately, the smoking gun is just out of the picture on your second screen shot.

The feature processing should have fallen into an if statement on line 145/146 instead of the else block that you are inside of.

Do you know what version of coba you are on? You can check either by looking in your python environment or running:

from coba import __version__

print(__version__)
jonastim commented 1 year ago

Thanks for the quick response, Mark! I worked directly in the checked out source code from early November that shows version 5.1.0. With the latest master the issue is resolved 🙌

The previous instance check seemed to have failed for feats as HashableDict

Screenshot 2023-01-12 at 8 24 32 AM

The LinUCB implementation seems to be struggling now, though. If the features are passed as a tuple LinUCB doesn't even show up in the result plot.

tuple_reward_function

and with the named features it performs on par with random

named_reward_function

With simplified, static reward probabilities

reward_probabilities_for_actions = [
    0.2,
    0.1
]

it actually picked the lower performing arm (even though features are irrelevant in this case)

named_static_reward

I am mostly interested in VW and only use LinUCB as the baseline but wanted to raise this nevertheless.

As a more general question, I was wondering what your experience with VW has been on these types of synthetic data tests. I am a bit concerned for example that the softmax explorer wasn't that much better than random in picking a variant that's twice as good. The bag explorer performed on par with random when the reward heavily relied on a feature 0.5 + 1.0 * (context["feature_1"] - 0.5) which you'd imagine would be easy to pick up (much easier than real world examples in which a variant is maybe 1-5% better).

Do you have any guidance on algo, hyperparameter and interaction terms selection that you've seen perform consistently well in these scenarios?

Thanks a lot!

mrucker commented 1 year ago

Great catch. You were correct there was a bug in LinUCB.

I recently upgraded coba to support continuous actions and during that refactor I broke LinUCB.

I added a unittest so hopefully it never happens again and released the patch. If you upgrade to 6.2.2 LinUCB is now working.

jonastim commented 1 year ago

Just gave the latest code a try and the performance looks much better for tuple-based features

tuple

but the named feature support still seems broken

named_feature

Code for reference


from typing import Optional, Sequence

import numpy as np

from coba import CobaRandom, Environments, RandomLearner, Experiment, VowpalSoftmaxLearner, VowpalBagLearner
from coba.environments import LambdaSimulation
from coba.primitives import Context, Action

class CustomEnvironment(LambdaSimulation):
    def __init__(self, n_interactions: Optional[int] = 1000):
        super().__init__(n_interactions, self.context, self.actions, self.rewards)
        self.r = CobaRandom(1)

    def actions(self, index: int, context: Context) -> Sequence[Action]:
        """
        actions: A function that should return all valid actions for a given index, context and random state.
        """
        return [0, 1]

    def context(self, index: int) -> Context:
        # return tuple([self.r.randoms(1)[0], self.r.randoms(1)[0], self.r.randoms(1)[0]])
        return {
            "feature_1": self.r.randoms(1)[0],
            "feature_2": self.r.randoms(1)[0],
            "feature_3": self.r.randoms(1)[0]
        }

    def rewards(self, index: int, context: Context, action: Action) -> float:
        reward_probabilities_for_actions = [
            0.2,
            0.1
        ]
        # reward_probabilities_for_actions = [
        #     0.5 + 1.0 * (context["feature_1"] - 0.5),
        #     0.5
        # ]
        return np.random.binomial(1, reward_probabilities_for_actions[action])

from coba import LinUCBLearner

environments = Environments([CustomEnvironment(5000)]).shuffle([1, 2, 3, 4, 5, 6, 7, 8])
learners = [
    VowpalBagLearner(features=[1, 'x', 'a', 'ax']),
    VowpalSoftmaxLearner(features=[1, 'x', 'a', 'ax']),
    LinUCBLearner(features=[1, 'x', 'a', 'ax']),
    RandomLearner(),
]
result = Experiment(environments, learners).run()

result.plot_learners()
mrucker commented 1 year ago

Unfortunately LinUCB doesn't currently support named features. It's easy to miss it but in the output log it should say:

image

There isn't any technical reason why it couldn't. The LinUCB implementation in coba uses the Sherman-Morrison equation instead of matrix inversion so large sparse matrices shouldn't be a problem. I simply haven't had a need or the time to implement it myself.

mrucker commented 1 year ago

(Also, regarding your question about VW and hyperparameters, yeah it is hard to know how to tune them. Unfortunately VW doesn't have the best documentation. I can only speak to my personal experience, but VW really shines when I'm either doing off policy learning or working with very messy real world data. For example data with covariate or concept shifts. It is fairly easy to emulate that kind of data in coba but if all you really want is to test linear IID datasets I feel like it will probably be difficult to beat UCB methods.)

mrucker commented 1 year ago

(Also Also, LinUCB is a good baseline. Another good baseline is simply the UcbBanditLearner. So long as you are working with actions that don't have features that should give you the performance of the best constant predictor. That is, it learns the best action and only plays that regardless of context.)

mrucker commented 1 year ago

(Finally, I don't know what you need but there are several synthetic environments already in coba if you'd like to use those.)

Here's an example of what that might look like with several commented out:

if __name__ == "__main__":

    n_processes = 6

    #environments = Environments.from_linear_synthetic(5000, n_context_features=3, n_action_features=0, reward_features=['a','ax','axx'])
    #environments = Environments.from_kernel_synthetic(5000, n_context_features=3, n_exemplars=5, kernel='exponential')
    #environments = Environments.from_mlp_synthetic(5000, n_context_features=10, n_action_features=0)
    environments = Environments.cache_dir(".coba_cache").from_openml(data_id=150, take=5000).scale("min","minmax")

    environments = environments.shuffle(n=5)

    learners = [
        VowpalEpsilonLearner(features=[1,'a','ax','axx'],epsilon=0.05,b=20),
        LinUCBLearner(features=[1,'a','ax']),
        UcbBanditLearner(),
        CorralLearner([
           LinUCBLearner(features=[1, 'x', 'a', 'ax']), 
           VowpalEpsilonLearner(features=[1, 'a', 'ax', 'axx'],epsilon=0.05, b=20)
        ], eta=0.1, mode="importance")
    ]

    Experiment(environments, learners).config(processes=n_processes).run().filter_fin().plot_learners(span=1000,err='se')
jonastim commented 1 year ago

Ah, I missed that log line and I guess was a bit surprised as I thought it was working before.

Thanks for your help and great advice! We eventually want to move to messy, non-stationary data but were hoping to see decent performance on simple experiments first. For non-contextual bandits VW's performance was pretty poor (borderline erratic) compared to Thompson Sampling or UCB. I've played around with a couple of your synthetic environments but wanted a bit finer grain control about the reward function (and to which degree it's influenced by particular features).

I'll keep experimenting and report back.

mrucker commented 1 year ago

Yeah, it's easy to miss. I even had a hard time finding it and I was looking for it. I could maybe make that more readable too.

If you end up needing/wanting LinUCB to work with dicts of features it wouldn't be super hard to get that functionality in place. We'd just need to implement some kind of hashing trick. I'd probably do it as an environment filter so that every learner could take advantage of it and not just LinUCB.

Awesome! Yes please reach out if you have more questions. And yeah, sorry to hear VW has been kind of touchy. I actually don't work for Microsoft so I'm hesitant to say anything about VW and how to tune it, though I can say it has been the best learner for me on several of my own projects. I'm actually just a grad student who has created Coba for my own PhD research. So, it could be there is another bug with VW stuff though I've gone over it with a super fine toothed comb and research teams inside of Microsoft use it for their own VW work.

In terms of messier real world data sets I'm sure you already know this but coba can also download datasets from openml.