VowpalWabbit / coba

Contextual bandit benchmarking
https://coba-docs.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
48 stars 19 forks source link

Access model artifacts after experiment run #26

Closed jonastim closed 1 year ago

jonastim commented 1 year ago

Is there are straight-forward way to access the trained model artifacts at the end of an experiment run?

I would like to perform further analyses like looking at the feature importance as well as eventually serialize the models. The result data-structure seems to only contain model meta-data but not the actual artifacts. The learners that are passed into the experiment also don't seem to be updated in place.

I hacked around it by retrieving the model during the training step but hope there's a better way.

mrucker commented 1 year ago

Yeah totally. So there are a few things that should help with this...

First, you're right the result only has general metadata bout the experiment though you can augment the data it does collect. But it still would fall short of actually having the trained learner in hand. There is no functionality presently to easily store that inside an experiment's results.

Now regarding getting the trained learner there are two reasons why modifying the learner "in-place" is a little nuanced:

  1. When there are multiple environments we have to duplicate learners so that learning doesn't spread between environments. When we duplicate learners there isn't a great way to know which one should be the "in-place" one so instead we just opt to not pass any of them.
  2. When there are multiple processes Coba uses Python's multiprocessing library which means that learners run on background processes where it is hard to share memory. I don't know how much you know about Python but this has to do with the Global Interpreter Lock.

At present, there are a few features/tricks you can use to get around this:

  1. If you run an experiment with a single environment on a single core it will train all learners in place. This should be the case no matter how many learners you have in the experiment.
  2. You can write your own custom EvaluationTask that let's you grab the learners. It sounds like you might have already done this. This isn't super ideal but the nice thing about this approach is that this gives you access to the learners on the background processes so you can run in multicore and still get the data you want. If this isn't what you did to hack around let me know and I can probably show you a better way.
  3. Coba actually has pretty advanced custom logging functionality if you're writing your own learners. This won't necessarily help you if what you want to know about are the out of the box learners but it might help. I use this all the time to track various statistics on our custom learners to make sure they're working.

If you let me know more about what you're trying to do and what you want to know I could send you a small code example.

Also, in the next two weeks I plan on upgrading Coba so that you can run experiments on multiple environments and get the learners back. It's funny that you ask about this now because it was next on my to do list. You'd still be limited to only using a single core but it'd be better than what we've got now.

jonastim commented 1 year ago

Thanks a lot for all the great information, Mark!

For my current explorations, I've run only a single process with a couple of shuffled environments. With the shuffling removed, I see that the learners are updated in place which is great.

The way I've accessed the learner so far is by adding them to the interactions table (alongside some other information) which is a bit wasteful but not a major concern for my experiments.

Screenshot 2023-01-25 at 8 22 59 AM

I hadn't thought about the multi-process aspect of it. Having a way to serialize the models and experiment results at the end of the run would be helpful for that.

Looking forward to trying out the new changes that are coming down the line 👍

mrucker commented 1 year ago

Cool, yeah you've got the right idea... Yes, you can stick anything into what is returned and it ends up in the interactions table.

Usually what I do is something like this:

from coba.experiments import SimpleEvaluation

my_trained_learners = []

class MyEvaluation:

  def process(learner,interactions):
     yield from SimpleEvaluation().process(learner,interactions)
     #Here you can put any custom code you want for learner... for example you could serialize it to file....
     my_trained_learners.append(learner)

learners = ...
environments = ...

cb.Experiment(env,lrn,evaluation_task=MyEvaluation()).run()

#my_trained_learners should have all your learners now...

That would mean you don't have to get into the guts and mess with modifying coba code and installing locally.

A second option still in line with what you're doing is if all you want is the learner would be something like this:

class MyEvaluation:
  def process(learner,interactions):
     #we list to make it evaluate but we don't actually care about its return
     list(SimpleEvaluation().process(learner,interactions))

     #now our learner is trained because we evaluated above
     #the interactions table will now have a single interaction with a learner column
     yield {"learner":learner}

Does that make sense? The magic here is that you can define your own evaluator and really do anything you want...

jonastim commented 1 year ago

Good points! I forked the repo and will try to contribute changes that are generally useful upstream (e.g. the option to log context with the interactions) and hope to address the remaining customizations through subclassing.

mrucker commented 1 year ago

That'd be awesome. Looks like maybe you saw I changed SimpleEvaluation a tiny bit when making your pull request. I've been working on a research paper regarding off-policy learning so I've been poking around in there lately. I should be done now, and It'd still be great to have context as a recording option.