ex-azure / ex_azure_speech

MIT License
4 stars 0 forks source link

Problem when sending audios larger than 1 minute #35

Closed revoltaxz closed 4 weeks ago

revoltaxz commented 1 month ago

Hi @YgorCastor!

First of all I’d like to thank you for this lib!

Currently I’m using the pronunciation assessment and I’m getting some errors on result because my audios have 2 min duration average. I took a look at some posts on MS forum and other people have the same problem,. Is there any way to send audios larger than 1 minute using the Recognizer.recognize_once()?

My code example

stream = File.stream!(audio, [], 32_768)
speech_opts = speech_assessment_opts(reference_text)

stream
|> Recognizer.recognize_once(speech_opts)

## Private function to mount the speech opts

defp speech_assessment_opts(reference_text) do
    [
      speech_context_opts: [
        speech_assessment: [
          reference_text: reference_text,
          grading_system: :hundred_mark,
          granularity: :word,
          dimension: :comprehensive,
          enable_miscue: true
        ]
      ],
      timeout: 30_000,
      socket_opts: [
        language: "pt-BR"
      ]
    ]
  end
YgorCastor commented 1 month ago

The recognition process is quite time-consuming, it can take quite some time for larger audio files, for that route i recommend going to the recognize_continuous function. However i just found out it's still cutting in one minute ish because the websocket work on "turns" of one minute, however i'm not starting a new turn after it ends. So i'll do two things:

  1. I'll fix this turn issue, which will allow longer recognition.
  2. Add a timeout option for the recognize_once call, so you will be able to decide for how long you want to wait for the full response.
revoltaxz commented 1 month ago

Great @YgorCastor, it will helps a lot!

YgorCastor commented 4 weeks ago

Not a bug, the solution was to use conversational recognition mode.