[TASK] Prototype LLM Comparison UI

johbaxter commented 3 months ago

Description

We need a way to be able to quickly swap out LLMs for apps that are used in our apps, and compare the responses between them

Existing

Currently there is a block that is used to display the variant, and also a setting component to edit the app variants.

What i need you to think through is whether those settings will be tied to that llm-compare block in our app json or something that will be stored on the backend of our app. Things like calling what variant we use at what percent of the time i think would live in our state JSON.

I am leaning towards no BE for storing variants and keeping the structure within our blocks.json.

The execution of LLM is done in our notebook and the BE is not concerned with the reactor that gets called. They just run the code we specify in our cell, so i am leaning towards a new LLM cell for your Comparison block to tie to the params you have specified in your LLM comparison settings, but this would then require you to specify what cell you want your comparison block to tie to.

This new LLM cell i think would have a new reactor very similar to LLM();, but i think it would take the app into account, as well as the percentages that we specify in our settings.

Just my scattered thought process, but i will allow you to continue think though this more.

Reference

This issue references https://github.com/SEMOSS/community/issues/18

Tasks

Please outline tasks list here, and add BE to do items as well. Link any attachments here as well for thought process behind it.

[ ]
[ ]

kgorbell commented 2 months ago

To start off, I'm not sure how this UI could be possible without having the BE store variants that are being used in the LLM Comparison's generated responses. There also is the consideration of Variants a user creates, but does not apply to the block it is attached to (selected/unselected).

Default Variant configuration: Mock Schema:

{
    variantId: "var123"
    variantName: "A",
    models: [
        {
            alias: "textgen_model1",
            name: "WizardLM",
            topP: 0.5,
            temperature: 0.2,
            tokenLength: 660,
        },
        {
            alias: "textgen_model2",
            name: "Vicuna",
            topP: 0.9,
            temperature: 0.7,
            tokenLength: 802,
        },
    ]
}

reactors Needed:

UPDATE: for creating/updating the Default Variant's config
Get: fetches the users' default variant's config which will be in view when the user Configures the LLM Comparison's Block in the editor.

Notes:

We do not currently have a UI for how to configure the Default Variant.
The 'alias' value is shown in each models' card on the UI at the very top See attached screenshots. Due to a lack of UI, I don't know if this value is procedurally generated, or set by the user during configuration.
From design's and my understanding, the user will be able to have 1-3 "models" per varaint, which is a number set by how many models they add to their default varaint.
Possible consideration: if a default variant's config can be updated, and the user changes the # of models to be greater or lower than it's current settings, how might this affect the other variant's stored in the configuration? E.G., the user have "Variant 2" with 2 models saved, but now it requires 3 models since the User has changed the default's config to require 3 models.

LLM Comparison Configuration Mock Schema (what is returned when the user fetches the block's configuration):

{
    displayModels: true,
    sampleSize: 100,
    trafficAllocation: [
        {
            variantName: 'A',
            allocation: 70,
        },
        {
            variantName: '2',
            allocation: 30,
        },
    ],
    variants: [
        {
            variantId: "var123"
            variantName: "A",
            selected: true,
            weight: 1,
            models: [
                {
                    alias: "textgen_model1",
                    name: "WizardLM",
                    topP: 0.5,
                    temperature: 0.2,
                    tokenLength: 660,
                },
                {
                    alias: "textgen_model2",
                    name: "Vicuna",
                    topP: 0.9,
                    temperature: 0.7,
                    tokenLength: 802,
                },
            ]
        }
    ]
}

Reactors Needed:

GET: fetches all Configuration data that pertains to generating the block's response. Add'l value required to fetch might be the block's ID?
UPDATE: modifies the current configuration of the block.

Notes:

When a block is first added to the app in the editor, how will we address 'fetching' the Response generated in the Block's rendered UI? Is there a default configuration and applied to the block that can generate a response, or will the user need to make a change to its configuration first?
How will the user 'save' their changes to the configuration? Will any change to the config automatically trigger the UI to 'save' the change and fetch the updated response rendered in the block?
The 'Traffic Allocation' poses some unique concerns for the UX that hinge on how the Block's configuration should be saved/updated (see question above).
UI does not currently offer a way to modify/set a Variant's name, but we will need names associated to each variant in order to set its 'order' in the 'Settings' menu.
Should the BE store Variants in the configuration that aren't 'selected' to be displayed in the Block?
Need to think through interactions between the weight and whether an item is selected.

LLM Comparison response Mock Schema:

{
    llmResponseId: "123ABC",
    displayModels: true,
    variants: [
        {
            variantId: "var123",
            variantName: "A",
            modelNames: ["WizardLM", "Vicuna"],
            response: "The recommended treatment for the diagnosis..."
            rating: 4,
        },
        {
            variantName: "var456",
            variantName: "2",
            modelNames: ["HPT3-Turbo", "Vicuna"],
            response: "The recommended treatment for the diagnosis..."
            rating: null,
        },
    ],
}

Reactors needed: I'm still a little shaky on how our Notebook Store can generate the responses. But if that's the case, then there should be no need for Reactors to fetch and update the generated responses.

Notes:

when a new response is generated and/or a variant is updated, the value for rating should be null/undefined
Weight/Order for determining the list order of the variants is part of the block's config, which could be added as a value in each variant's object, or set by the order in which they are sent in the payload.
"llmResponseId" & "variantId" aren't necessarily needed on the FE side, though they may be if more changes are made to the proposed UI in the future.

johbaxter commented 2 months ago

@kgorbell

I like the schema you have for the mock configuration. We can store this on the block.data using the settings component we decide to have for the block.

Our DS for state and blocks allows us to handle the data retrieval and setting so we likely don't want to introduce an extra complexity of pulling in variants. The execution piece which we will cross when the time comes is where we may need some help. (execution as in use this variant 70% of the time)

But as we discussed on the call, for now handle each variant as a list of a single model. (keep it as a list to prepare for multiple models in a variant).

For now i want to be able tie and compare a single LLM response with the different variants. (one to one)

As for how to tie the variants to the response. At the block level we sorta need to know what query-sheet or cell we want it to tie too. We may need to create duplicates (query-sheet, cells) to be able to tie to our other variants.

We shouldn't need much BE i'm envisioning at least for now. But I will be working on this in tandem with you, but please try to make progress with it based on thought process discussed on call and what i stated above.

SEMOSS / semoss-ui