Closed maxpagels closed 3 years ago
@maxpagels from your notebook the example that is being used to predict/learn has no features, i.e. the context is empty. Is this intentional?
There is a cats tutorial notebook that has not yet been reviewed but it is here https://github.com/VowpalWabbit/jupyter-notebooks/pull/6 if you think it might be of any help
@olgavrou no features is intentional, the same non-convergence happens if you add a constant feature
@maxpagels it actually looks like 0.2
is already too big a bandwidth for this use case. I ran the following (building off of your notebook)
param_grid = {
'cats': [8, 16, 32, 256, 1024],
'epsilon': [0.1],
'bw': [0.05, 0.1, 0.2, 0.5]
}
import itertools
import pandas as pd
all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]
best_loss = 999999999999.0
for params in all_params:
metrics = {
'costs': [],
'predictions': []
}
options = f"--cats {params['cats']} --min_value 0.0 --bandwidth {params['bw']} --max_value 1.0 --coin --epsilon {params['epsilon']}"
print('options: ', options)
vw = pyvw.vw(options, enable_logging=True)
for i in range(0, 100000):
ex = "|"
action, prob = vw.predict(ex)
cost = get_cost(action)
learn_ex = vw.parse(f"ca {action}:{cost}:{prob} {ex}", pyvw.vw.lContinuous)
vw.learn(learn_ex)
vw.finish_example(learn_ex)
metrics['predictions'].append(action)
metrics['costs'].append(cost)
df = pd.DataFrame(metrics)
print('mean pred:', df['predictions'].mean())
df['predictions'].hist().plot()
plt.show()
vw.finish()
and it looks like a bandwidth of 0.1
with number of arguments around 16
or 32
does well, also maybe a bandwidth of 0.05
could also work, although it looks like 0.1
does better for a wider range of --cats
args. It also looks like --cats 8
and anything above --cats 256
isn't suitable for this test case.
let me know what you think.
Note on the python side of the code: switched predict to take the example string directly, otherwise after every vw.parse()
we would need to call vw.finish_example()
I'm attaching a horribly changed version of your notebook, with the same setup as before but with range [0, 100] and 50 as the point of zero cost. So the best reward would be 0. The gridsearch, again, nowhere near optimal results in any configuration. Some even start off OK and get way worse. I guess by adding even more combinations I'd get closer, but I would have thought that even with some initial parameter set I'd get better learning than this. If you can confirm this is working as intended, we can close this ticket. At least for online learning without a simulator to start with, I can't unfortunately use the algorithm for my use case because at the start, grid searching is not an option and without the absolute correct values learning seems to go pretty wrong, even for this simple case. I also wouldn't even know what ranges to gridsearch over even if I could gridsearch against a simulator.
@maxpagels it looks like there might be a bug in the tree. I have created a draft PR #2900 with which I tested an adaptation of your latest notebook (attached). Let me know if you try the branch out. Still need to test it more. cats_predict_50.ipynb.zip
Used the wheel built for your PR branch. Generally seems much better, but if I remove coin
, which should then default to GD, I get weird spikes. Happens regardless of if I set the exploration rate a bit higher. Screenshots attached.
@maxpagels cats was mainly developed and tested using coin
but gd
should also work by adjusting the different learning rates (i.e. learning_rate
, power_t
, ..). That been said it might make sense to make coin
the default with cats
with the ability to switch on gd
or something else if needed.
Making it the default makes sense to me at least.
re-opening as we are checking some more things with Akshay
Describe the bug
Not sure if this is a bug, so feel free to close if not. I am running a simple test of CATs, with min_value 0 and max_value 1, where the cost is defined as follows:
The context is fixed for each round, so I would assume this to be a pretty simple learning task. I've made a jupyter notebook that gridsearches a bunch of options (actions, bandwith, epsilon) and can't find a combination that satisfactorily predicts close to 0.5 when exploiting. Some parameter combinations result in wildly different average cost. Changing GD to FTRL doesn't help. Even if I ensure good exploration by changing epsilon, i see no reduction in accrued average cost per round over time.
So my question is, is this behaviour a feature of CATs, some bug, or something I am just missing in my code?
To Reproduce
Install a fresh pyvw wheel from master (pypi doesn't yet have CATS support) and run the attached notebook: CATS-test.ipynb.zip
Expected behavior
I would expect predictions for such a simple use case to converge to the optimal prediction of 0.5 (0 cost)
Observed Behavior
Observe the cost plots in the notebook not trending down over time, regardless of gridsearch combination
Environment
pyvw, OS X, latest built wheel from the master branch