tannisroot commented 3 weeks ago

This new relatively small mistral-nemo local model is shockingly capable at Home Assistant tasks. I think if these tests are used by HA in the future to give the current state of LLM conversation agents, it would be interesting to have that model in the lineup too!

allenporter commented 4 days ago

Here are the steps i'm running to produce the model eval.

Refresh deps

$ source venv/bin/activate
$  pip3 install -r requirements_dev.txt

Updated `models.yaml` with the new model.

  - model_id: mistral-nemo
    domain: ollama
    description: A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA.
    urls:
      - https://mistral.ai/news/mistral-nemo/
      - https://ollama.com/library/mistral-nemo
    config_entry_data:
      url: !secret ollama_url
      model: mistral-nemo:12b
    config_entry_options:
      llm_hass_api: assist
      num_ctx: 8192  # Note: Model has 128k context length

Run eval

Manually pull in ollama instance ollama pull mistral-nemo

Run eval

$ export PYTHONPATH="${PYTHONPATH}:${PWD}"
$ DATASET="datasets/assist-mini/"
$ pip3 freeze | grep homeassistant  # check current home assistant version
$ MODEL_OUTPUT_DIR="reports/assist-mini/2024.9.2"
$ MODEL=mistral-nemo
$ home-assistant-datasets assist collect --model_output_dir=${MODEL_OUTPUT_DIR} --dataset=${DATASET} --models=${MODEL}

Make report

$ home-assistant-datasets assist eval --model_output_dir=${MODEL_OUTPUT_DIR} --output_type=csv > ${MODEL_OUTPUT_DIR}/report.csv
$ home-assistant-datasets leaderboard prebuild
$ home-assistant-datasets leaderboard build

allenporter commented 4 days ago

Leaderboard updated, mistral-nemo scores 81% on assist-mini

allenporter / home-assistant-datasets

Add mistral-nemo to the test suite #40

Refresh deps

Updated `models.yaml` with the new model.

Run eval

Make report

allenporter / home-assistant-datasets

Add mistral-nemo to the test suite #40

Refresh deps

Updated models.yaml with the new model.

Run eval

Make report

Updated `models.yaml` with the new model.