Open johbaxter opened 3 months ago
To start off, I'm not sure how this UI could be possible without having the BE store variants that are being used in the LLM Comparison's generated responses. There also is the consideration of Variants a user creates, but does not apply to the block it is attached to (selected/unselected).
Default Variant configuration: Mock Schema:
{
variantId: "var123"
variantName: "A",
models: [
{
alias: "textgen_model1",
name: "WizardLM",
topP: 0.5,
temperature: 0.2,
tokenLength: 660,
},
{
alias: "textgen_model2",
name: "Vicuna",
topP: 0.9,
temperature: 0.7,
tokenLength: 802,
},
]
}
reactors Needed:
Notes:
LLM Comparison Configuration Mock Schema (what is returned when the user fetches the block's configuration):
{
displayModels: true,
sampleSize: 100,
trafficAllocation: [
{
variantName: 'A',
allocation: 70,
},
{
variantName: '2',
allocation: 30,
},
],
variants: [
{
variantId: "var123"
variantName: "A",
selected: true,
weight: 1,
models: [
{
alias: "textgen_model1",
name: "WizardLM",
topP: 0.5,
temperature: 0.2,
tokenLength: 660,
},
{
alias: "textgen_model2",
name: "Vicuna",
topP: 0.9,
temperature: 0.7,
tokenLength: 802,
},
]
}
]
}
Reactors Needed:
Notes:
LLM Comparison response Mock Schema:
{
llmResponseId: "123ABC",
displayModels: true,
variants: [
{
variantId: "var123",
variantName: "A",
modelNames: ["WizardLM", "Vicuna"],
response: "The recommended treatment for the diagnosis..."
rating: 4,
},
{
variantName: "var456",
variantName: "2",
modelNames: ["HPT3-Turbo", "Vicuna"],
response: "The recommended treatment for the diagnosis..."
rating: null,
},
],
}
Reactors needed: I'm still a little shaky on how our Notebook Store can generate the responses. But if that's the case, then there should be no need for Reactors to fetch and update the generated responses.
Notes:
@kgorbell
I like the schema you have for the mock configuration. We can store this on the block.data using the settings component we decide to have for the block.
Our DS for state and blocks allows us to handle the data retrieval and setting so we likely don't want to introduce an extra complexity of pulling in variants. The execution piece which we will cross when the time comes is where we may need some help. (execution as in use this variant 70% of the time)
But as we discussed on the call, for now handle each variant as a list of a single model. (keep it as a list to prepare for multiple models in a variant).
For now i want to be able tie and compare a single LLM response with the different variants. (one to one)
As for how to tie the variants to the response. At the block level we sorta need to know what query-sheet or cell we want it to tie too. We may need to create duplicates (query-sheet, cells) to be able to tie to our other variants.
We shouldn't need much BE i'm envisioning at least for now. But I will be working on this in tandem with you, but please try to make progress with it based on thought process discussed on call and what i stated above.
Description
We need a way to be able to quickly swap out LLMs for apps that are used in our apps, and compare the responses between them
Existing
Currently there is a block that is used to display the variant, and also a setting component to edit the app variants.
What i need you to think through is whether those settings will be tied to that llm-compare block in our app json or something that will be stored on the backend of our app. Things like calling what variant we use at what percent of the time i think would live in our state JSON.
I am leaning towards no BE for storing variants and keeping the structure within our blocks.json.
The execution of LLM is done in our notebook and the BE is not concerned with the reactor that gets called. They just run the code we specify in our cell, so i am leaning towards a new LLM cell for your Comparison block to tie to the params you have specified in your LLM comparison settings, but this would then require you to specify what cell you want your comparison block to tie to.
This new LLM cell i think would have a new reactor very similar to
LLM();
, but i think it would take the app into account, as well as the percentages that we specify in our settings.Just my scattered thought process, but i will allow you to continue think though this more.
Reference
This issue references https://github.com/SEMOSS/community/issues/18
Tasks
Please outline tasks list here, and add BE to do items as well. Link any attachments here as well for thought process behind it.