acon96 / home-llm

A Home Assistant integration & Model to control your smart home using a Local LLM
483 stars 56 forks source link

Assistant saying "endoftext" after every response #167

Closed cryptk closed 2 weeks ago

cryptk commented 3 weeks ago

Describe the bug
On v0.3.2 with the Home-3B-v3.f16.gguf model on a freshly configured integration, the voice assistant says "endoftext" after every response

Expected behavior
The voice assistant should not send extraneous control data to the TTS engine

Logs
Here is the log line showing what is generated by the llama engine (this is running inside of LocalAI

4:18PM DBG Response: {"created":1718036160,"object":"text_completion","id":"9444a4d6-76c6-4024-8549-04ffb2c47334","model":"home-3B-v3","choices":[{"index":0,"finish_reason":"stop","text":"deactivating office light.\n```homeassistant\n{\"service\": \"light.turn_off\", \"target_device\": \"light.office_ceiling_fan_light\"}\n```\u003c|endoftext|\u003e"}],"usage":{"prompt_tokens":1047,"completion_tokens":42,"total_tokens":1089}}

And here is the raw data from the voice assistant debug showing a short interaction turning off a light where you can see that "<|endoftext|>" is included in the input to the TTS engine:

stage: done
run:
  pipeline: 01hrj8d3p11vrhmy2nwrb9b4t3
  language: en
events:
  - type: run-start
    data:
      pipeline: 01hrj8d3p11vrhmy2nwrb9b4t3
      language: en
    timestamp: "2024-06-10T16:18:04.183383+00:00"
  - type: wake_word-start
    data:
      entity_id: wake_word.openwakeword
      metadata:
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
      timeout: 5
    timestamp: "2024-06-10T16:18:04.183436+00:00"
  - type: wake_word-end
    data:
      wake_word_output:
        wake_word_id: ok_nabu_v0.1
        wake_word_phrase: ok nabu
        timestamp: 675
    timestamp: "2024-06-10T16:18:05.106447+00:00"
  - type: stt-start
    data:
      engine: stt.faster_whisper
      metadata:
        language: en
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-06-10T16:18:05.106704+00:00"
  - type: stt-vad-start
    data:
      timestamp: 1170
    timestamp: "2024-06-10T16:18:06.066984+00:00"
  - type: stt-vad-end
    data:
      timestamp: 1825
    timestamp: "2024-06-10T16:18:07.378857+00:00"
  - type: stt-end
    data:
      stt_output:
        text: " Turn off the office light."
    timestamp: "2024-06-10T16:18:08.164297+00:00"
  - type: intent-start
    data:
      engine: 3666ee94d856a2b09ff1d1aa4782b819
      language: "*"
      intent_input: " Turn off the office light."
      conversation_id: 01J01F1HWMW1ZWY133FCFS4N2M
      device_id: c19569088489c5d8560aca65de5c47ff
    timestamp: "2024-06-10T16:18:08.164428+00:00"
  - type: intent-end
    data:
      intent_output:
        response:
          speech:
            plain:
              speech: |-
                deactivating office light.
                <|endoftext|>
              extra_data: null
          card: {}
          language: "*"
          response_type: action_done
          data:
            targets: []
            success: []
            failed: []
        conversation_id: 01J01F1HWMW1ZWY133FCFS4N2M
    timestamp: "2024-06-10T16:18:08.931089+00:00"
  - type: tts-start
    data:
      engine: tts.piper
      language: en_US
      voice: en_US-libritts-high
      tts_input: |-
        deactivating office light.
        <|endoftext|>
    timestamp: "2024-06-10T16:18:08.931190+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/tts.piper?message=deactivating+office+light.%0A%3C%7Cendoftext%7C%3E&language=en_US&voice=en_US-libritts-high&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
        url: >-
          /api/tts_proxy/e14c89e859cf62d2fcf1ddb814dbf5140b159aa6_en-us_34bb7cbde3_tts.piper.wav
        mime_type: audio/x-wav
    timestamp: "2024-06-10T16:18:08.931413+00:00"
  - type: run-end
    data: null
    timestamp: "2024-06-10T16:18:08.931635+00:00"
wake_word:
  entity_id: wake_word.openwakeword
  metadata:
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  timeout: 5
  done: true
  wake_word_output:
    wake_word_id: ok_nabu_v0.1
    wake_word_phrase: ok nabu
    timestamp: 675
stt:
  engine: stt.faster_whisper
  metadata:
    language: en
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: " Turn off the office light."
intent:
  engine: 3666ee94d856a2b09ff1d1aa4782b819
  language: "*"
  intent_input: " Turn off the office light."
  conversation_id: 01J01F1HWMW1ZWY133FCFS4N2M
  device_id: c19569088489c5d8560aca65de5c47ff
  done: true
  intent_output:
    response:
      speech:
        plain:
          speech: |-
            deactivating office light.
            <|endoftext|>
          extra_data: null
      card: {}
      language: "*"
      response_type: action_done
      data:
        targets: []
        success: []
        failed: []
    conversation_id: 01J01F1HWMW1ZWY133FCFS4N2M
tts:
  engine: tts.piper
  language: en_US
  voice: en_US-libritts-high
  tts_input: |-
    deactivating office light.
    <|endoftext|>
  done: true
  tts_output:
    media_id: >-
      media-source://tts/tts.piper?message=deactivating+office+light.%0A%3C%7Cendoftext%7C%3E&language=en_US&voice=en_US-libritts-high&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
    url: >-
      /api/tts_proxy/e14c89e859cf62d2fcf1ddb814dbf5140b159aa6_en-us_34bb7cbde3_tts.piper.wav
    mime_type: audio/x-wav
mikeperalta1 commented 2 weeks ago

Would be cool if it ended with "End of Line" like Tron

acon96 commented 2 weeks ago

this should be fixed in v0.3.3