ajbouh / substrate

9 stars 3 forks source link

librarian: benchmark for librarian language models #26

Open ajbouh opened 8 months ago

yoshikiohshima commented 8 months ago

Yoshiki is working on making a benchmark program. The example is to extract year (or time span) and description of an event or an artifact from a talk transcript. To see how a language model works well for the task, we need to have a judge program that says expected items are in the result and not unexpected results. The latter comes down to comparing two entries [{year, description}, {year: description}].

Currently I am working on the judge part of it. I lifted some code from llama cpp front end JS code to make it a node js code. I have test data that are pairs of such and have expected result of yes or no. The llava of llamafile 1.5 is not good at this. and I found some timing issues in my JS code as of writing. Finding a model that can do a good job, and fixing the timing issues in my code are the next step.

yoshikiohshima commented 8 months ago

A pull request to add the benchmark program to /experiments is created GH-69