aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
185 stars 42 forks source link

[Feature] Multi-variable prompt templates #246

Open athewsey opened 5 months ago

athewsey commented 5 months ago

I see from this comment this feature may be coming already, but the attached issue was closed with the interim workaround.

For our use-case we have a multi-field dataset for example:

source_doc question ref_answers
(full text) What date was the agreement signed, in YYYY-MM-DD format? 2024-04-09
... ... ...

...Where the final LLM prompt would combine both the source document and the question, in a (constant) template. I could easily see more general use-cases with other fields too.

Today, we're hacking around fmeval by doing prompt fulfilment as a separate step before the library. It would be much better if fmeval is able to directly process raw multi-field datasets like this, by taking a prompt template that can reference arbitrary fields from the source record.

My reason for revisiting this was Claude v3's messages API, which means we're going to have to do more sophisticated fulfilment on our side to achieve the same effect.

athewsey commented 5 months ago

I was hoping to use the default composer with Claude v3 on Bedrock by using a setup along the lines of:

runner = BedrockModelRunner(content_template='{"messages": $prompt, "anthropic_version": "bedrock-2023-05-31", "max_tokens": 500}')

...with a dataset where $prompt has been pre-fulfilled as JSON message sequences like:

{
    "prompt": "[{\"role\": \"user\", \"content\": [{\"type\": \"text\", \"text\": \".............\"}]}]"
    "answers": "2024-04-09"
}

...But unfortunately, the JsonContentComposer results in the messages structure remaining stringified

athewsey commented 4 months ago

As noted above, it's confusing that fmeval makes it seem like content_template might support structured data, but in practice JsonContentComposer forces everything to be enclosed in a string.

My current workaround for this (now open-sourced) is to override BedrockModelRunner._composer with a StructuredContentComposer that supports providing arbitrary JSON.

Our app currently completely bypasses fmeval's prompt template fulfilment, because it isn't flexible enough to support multiple context data fields. Hopefully this changes in future!