OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
1.01k stars 139 forks source link

Pipeline to evaluate models autoregressively #421

Open LaPetiteSouris opened 1 year ago

LaPetiteSouris commented 1 year ago

Feature request

To build a generic script/pipeline which takes input as :

Then the pipeline should:

This pipeline should gives a baseline reference to an LLM model on how good it is.

Motivation

To help solving https://github.com/OpenAdaptAI/OpenAdapt/issues/393 and also facilitates the work of https://github.com/OpenAdaptAI/OpenAdapt/issues/419

Only with a good pipeline, then we can easily evaluate existing models, as well es evaluating foundation model after fine-tuned/reinforced learning improvement.

FFFiend commented 1 year ago

There has been some development to respond to this issue already, see #379, with a current basic API defined along with a file structure.

LaPetiteSouris commented 1 year ago

Thanks @FFFiend

I'll try to incorporate as much as possible the guidelines from #379. Looks like in the end, many tasks can be shared between modules, notably tasks related to model evaluation/tuning.

There is a slight difference is that the scope of this ticket is strictly limited to provide a way to quickly evaluate the performance of a given model, while #379 tries to solve a bigger issue, which is to define a standardized way to interact with models. Solving #379 will take time, while this smaller ticket will unblock right away the capacity to evaluate models out of the box #419 as well as perform Reinforced Learning #393

WHen #379 is solved, we can easily back-port those recommendations, interfaces... etc into this script to standardise things.