Open LaPetiteSouris opened 1 year ago
There has been some development to respond to this issue already, see #379, with a current basic API defined along with a file structure.
Thanks @FFFiend
I'll try to incorporate as much as possible the guidelines from #379. Looks like in the end, many tasks can be shared between modules, notably tasks related to model evaluation/tuning.
There is a slight difference is that the scope of this ticket is strictly limited to provide a way to quickly evaluate the performance of a given model, while #379 tries to solve a bigger issue, which is to define a standardized way to interact with models. Solving #379 will take time, while this smaller ticket will unblock right away the capacity to evaluate models out of the box #419 as well as perform Reinforced Learning #393
WHen #379 is solved, we can easily back-port those recommendations, interfaces... etc into this script to standardise things.
Feature request
To build a generic script/pipeline which takes input as :
Then the pipeline should:
This pipeline should gives a baseline reference to an LLM model on how good it is.
Motivation
To help solving https://github.com/OpenAdaptAI/OpenAdapt/issues/393 and also facilitates the work of https://github.com/OpenAdaptAI/OpenAdapt/issues/419
Only with a good pipeline, then we can easily evaluate existing models, as well es evaluating foundation model after fine-tuned/reinforced learning improvement.