Closed mongodben closed 2 months ago
I would suggest using the logging SDK (see here: https://www.braintrust.dev/docs/guides/evals/write#logging-sdk) if you have pre-generated outputs.
Also closing this out since it doesn't specifically have to do with autoevals. If you have further questions or feedback about this, feel free to ping us on Discord or in the SDK repo (https://github.com/braintrustdata/braintrust-sdk).
I would suggest using the logging SDK (see here: https://www.braintrust.dev/docs/guides/evals/write#logging-sdk) if you have pre-generated outputs.
this seems like exactly what i need, thank you!
Currently, it's less than straightforward to run evals if the answer is pre-generated, or based on case-specific data beyond the input.
This is because the
Eval
'stask()
function only accepts theinput
string as an argument.I think it's important to be able to evaluate against pre-generated outputs so that we can decouple the evaluation stage (in Braintrust) from the dataset generation stage, which doesn't necessarily require Braintrust.
Here's my current implementation, which relies on creating a closure over the task() function to iterate thought pre-generated responses:
While this seems to work fine, it would be clearer and less reliant on closures (which some folks might be less familiar with), if you could pass additional data to the task function.
I think a straight-forward way to do this would be to allow passing all the contents of the Data object being evaluated to the
task()
function.This'd give the task function a signature like:
Then I could include any pre-generated answers or other logic that I want to use in the
Data.metadata
object. For example, this could look like: