Closed voberoi closed 3 weeks ago
While this isn't really the design center of Inspect (which is fundamentally a harness of testing/evaluation) there is no reason you can't do something like what you are proposing. You would really just need to construct a TaskState
, run the loop over the plan, and then inspect the TaskState
at the end for output.
Thanks that's helpful!
What can folks expect in terms of stability of things like the solver interface? Or how to go about instantiating generators?
The sovler interface is stable.
Let's say I've built a suite of evals for my task, and I've gotten it working well enough that I'd like to deploy a plan to production.
Is there a path today to go from a working plan/config in the evaluation suite to running that plan/config in production?
It seems like one way to do this would be to replicate this loop in my own code: https://github.com/UKGovernmentBEIS/inspect_ai/blob/b6b75a369eb702fca518c03d893f5067301151ec/src/inspect_ai/_eval/task/run.py#L258
... and then extract what I need from the TaskState structure to do whatever I want to with the output.
Any reason that would be inadvisable?