Pipeline to evaluate models autoregressively

LaPetiteSouris commented 1 year ago

Feature request

To build a generic script/pipeline which takes input as :

Model name
One or multiple recording

Then the pipeline should:

Build prompts from events from recording.
Load model from generic source (OpenAPI, HuggingFace ...etc) in an automatic manner
Perform completion using model auto-regressively on the training dataset.
Score the completion using a simple scoring algorithm.

This pipeline should gives a baseline reference to an LLM model on how good it is.

Motivation

To help solving https://github.com/OpenAdaptAI/OpenAdapt/issues/393 and also facilitates the work of https://github.com/OpenAdaptAI/OpenAdapt/issues/419

Only with a good pipeline, then we can easily evaluate existing models, as well es evaluating foundation model after fine-tuned/reinforced learning improvement.

FFFiend commented 1 year ago

There has been some development to respond to this issue already, see #379, with a current basic API defined along with a file structure.

LaPetiteSouris commented 1 year ago

Thanks @FFFiend

I'll try to incorporate as much as possible the guidelines from #379. Looks like in the end, many tasks can be shared between modules, notably tasks related to model evaluation/tuning.

There is a slight difference is that the scope of this ticket is strictly limited to provide a way to quickly evaluate the performance of a given model, while #379 tries to solve a bigger issue, which is to define a standardized way to interact with models. Solving #379 will take time, while this smaller ticket will unblock right away the capacity to evaluate models out of the box #419 as well as perform Reinforced Learning #393

WHen #379 is solved, we can easily back-port those recommendations, interfaces... etc into this script to standardise things.

OpenAdaptAI / OpenAdapt

Pipeline to evaluate models autoregressively #421

Feature request

Motivation