allenporter / home-assistant-datasets

This package is a collection of datasets for evaluating AI Models in the context of Home Assistant.
https://allenporter.github.io/home-assistant-datasets
22 stars 1 forks source link

Add mistral-nemo to the test suite #40

Closed tannisroot closed 4 days ago

tannisroot commented 3 weeks ago

This new relatively small mistral-nemo local model is shockingly capable at Home Assistant tasks. I think if these tests are used by HA in the future to give the current state of LLM conversation agents, it would be interesting to have that model in the lineup too!

allenporter commented 4 days ago

Here are the steps i'm running to produce the model eval.

Refresh deps

$ source venv/bin/activate
$  pip3 install -r requirements_dev.txt 

Updated models.yaml with the new model.

  - model_id: mistral-nemo
    domain: ollama
    description: A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA.
    urls:
      - https://mistral.ai/news/mistral-nemo/
      - https://ollama.com/library/mistral-nemo
    config_entry_data:
      url: !secret ollama_url
      model: mistral-nemo:12b
    config_entry_options:
      llm_hass_api: assist
      num_ctx: 8192  # Note: Model has 128k context length

Run eval

Make report

$ home-assistant-datasets assist eval --model_output_dir=${MODEL_OUTPUT_DIR} --output_type=csv > ${MODEL_OUTPUT_DIR}/report.csv
$ home-assistant-datasets leaderboard prebuild
$ home-assistant-datasets leaderboard build
allenporter commented 4 days ago

Leaderboard updated, mistral-nemo scores 81% on assist-mini