defenseunicorns / leapfrogai

Production-ready Generative AI for local, cloud native, airgap, and edge deployments.
https://leapfrog.ai
Apache License 2.0
258 stars 29 forks source link

(spike) Evals "Model Card" #721

Closed jalling97 closed 1 month ago

jalling97 commented 4 months ago

Description

How the evaluation results get delivered is crucially important. This spike covers what a "model card" would look like for evaluating a model against our framework. The "model card" should help clearly answer the question: "which model should I use for my use-case?"

The model card should incorporate input from design and should convey the most important informational takeaways in a clear and efficient way.

Relevant Links

Galileo Hallucination Index

jalling97 commented 1 month ago

Summary notes from a meeting discussing the Model Card:

jalling97 commented 1 month ago

Decision

The model card will ultimately exist in a few forms:

A model card report will consist of the table of evaluation metrics as well as a written summary of what the metrics mean, how they relate to specific performance considerations, as well as model recommendations. Therefore, this report can be generalized for a wide audience, but will need to be customized for a given potential deployment scenario.