UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://UKGovernmentBEIS.github.io/inspect_ai/
MIT License
385 stars 41 forks source link

toy example for multi-turn dialogues using inspect #25

Closed Lovkush-A closed 3 weeks ago

Lovkush-A commented 1 month ago

I created a toy example using Inspect for multi-turn dialogue. https://github.com/Lovkush-A/inspect_multiturn_dialogue

This might be a lot to ask, but can somebody in AISI skim through and let me know if there is cleaner way of doing things? There is only one notebook to read through, in which i create a dataset, solver, scorer, task and run the evals. I plan to share this around AI safety community so I want to make sure I am using framework properly.

Thank you!

aisi-inspect commented 1 month ago

The messages will always a list of messages initialized from the input. I would definitely code against the messages not the input, which will allow for other solvers to be able to tweak the messages in some fashion (not likely, but its nice to not shut the door on this entirely). So something more like this:

async def solve(state: TaskState, generate: Generate) -> TaskState:

    input = state.messages.copy()
    state.messages = []

    for turn in input:
        state.messages.append(turn)
        state = await generate(state)

    return state

In terms of always_false_scorer(), you should just be able to have no scorer at all in your Task (then eval will just skip the scoring step).

Lovkush-A commented 2 weeks ago

@aisi-inspect Better late than never, but thanks for the feedback! Very helpful.

aisi-inspect commented 2 weeks ago

FYI we did actually reply to this immediately when originally posted (maybe you just saw the reply now b/c we closed it?). Just didn't want you to think we were that unresponsive :-)

Lovkush-A commented 2 weeks ago

Oh no! I totally mis communicated there. I was referring to my thanks being late, not your help. I saw you helped quickly but I have been delayed to come back to this.

Again, apologies. Re reading my message, I really come across as passive aggressive...

jjallaire commented 2 weeks ago

Thx for the follow up, no worries at all! :-)