log on action and prob for off-policy evaluation

jonastim commented 1 year ago

The main change is for the off-policy evaluator to log the action and probability of the learning model rather than that of the logged data (which is identical for all models and not very useful when trying to compare different learners).

Introduced a helper function for sampling the actions and used if in some other code placed to avoid redundancy.

codecov[bot] commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (7499d8b) 99.90% compared to head (3c0686b) 99.90%.

:exclamation: Current head 3c0686b differs from pull request most recent head dd3b34b. Consider uploading reports for the commit dd3b34b to get more accurate results

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #43 +/- ## ========================================== - Coverage 99.90% 99.90% -0.01% ========================================== Files 55 56 +1 Lines 7455 7366 -89 ========================================== - Hits 7448 7359 -89 Misses 7 7 ``` | [Flag](https://app.codecov.io/gh/VowpalWabbit/coba/pull/43/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=VowpalWabbit) | Coverage Δ | | |---|---|---| | [](https://app.codecov.io/gh/VowpalWabbit/coba/pull/43/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=VowpalWabbit) | `99.90% <100.00%> (-0.01%)` | :arrow_down: | | [ubuntu-latest](https://app.codecov.io/gh/VowpalWabbit/coba/pull/43/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=VowpalWabbit) | `99.90% <100.00%> (-0.01%)` | :arrow_down: | | [unittest](https://app.codecov.io/gh/VowpalWabbit/coba/pull/43/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=VowpalWabbit) | `99.90% <100.00%> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=VowpalWabbit#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jonastim commented 1 year ago

I can't seem to re-run tests. Looks like these unrelated tests should be less sensitive

======================================================================
FAIL: test_DM (coba.tests.test_environments_filters.OpeRewards_Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/coba/coba/coba/tests/test_environments_filters.py", line 2390, in test_DM
    self.assertAlmostEqual(new_interactions[0]['rewards'].eval('c'),.79699, places=4)
AssertionError: 0.7970473766326904 != 0.79699 within 4 places (5.7376632690453455e-05 difference)

======================================================================
FAIL: test_DM_action_not_hashable (coba.tests.test_environments_filters.OpeRewards_Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/coba/coba/coba/tests/test_environments_filters.py", line 2404, in test_DM_action_not_hashable
    self.assertAlmostEqual(new_interactions[0]['rewards'].eval(['c']),.79699, places=4)
AssertionError: 0.7970473766326904 != 0.79699 within 4 places (5.7376632690453455e-05 difference)

----------------------------------------------------------------------

mrucker commented 1 year ago

I can't seem to re-run tests. Looks like these unrelated tests should be less sensitive

======================================================================
FAIL: test_DM (coba.tests.test_environments_filters.OpeRewards_Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/coba/coba/coba/tests/test_environments_filters.py", line 2390, in test_DM
    self.assertAlmostEqual(new_interactions[0]['rewards'].eval('c'),.79699, places=4)
AssertionError: 0.7970473766326904 != 0.79699 within 4 places (5.7376632690453455e-05 difference)

======================================================================
FAIL: test_DM_action_not_hashable (coba.tests.test_environments_filters.OpeRewards_Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/coba/coba/coba/tests/test_environments_filters.py", line 2404, in test_DM_action_not_hashable
    self.assertAlmostEqual(new_interactions[0]['rewards'].eval(['c']),.79699, places=4)
AssertionError: 0.7970473766326904 != 0.79699 within 4 places (5.7376632690453455e-05 difference)

----------------------------------------------------------------------

Totally agree. Are these the only tests you're having problems with? I've tried to bump down a lot of tests over time.

VowpalWabbit / coba

log on action and prob for off-policy evaluation #43

Codecov Report