drachtio / drachtio-freeswitch-modules

A collection of open-sourced freeswitch modules that I use in various drachtio applications
MIT License
169 stars 114 forks source link

mod_google_transcribe: Feature Request: Support for `google::cloud::speech::v2` #149

Open entenschnabel opened 8 months ago

entenschnabel commented 8 months ago

Hi,

I was wondering whether you had any plans to support google::cloud::speech::v2 in addition to or instead of google::cloud::speech::v1p1beta1 inside mod_google_transcribe. If you're interested I might soon have a PR to share regarding this but I just wondered whether you had plans to do this anyway.

davehorton commented 8 months ago

I would love some information about v2 - what it adds, what it breaks etc. Last I looked, some time ago, it seemed to lack some things that were in v1beta1, but I may have been mistaken. If you can point me to some docs that would be appreciated

entenschnabel commented 8 months ago

What actually nudged us into thinking about v2 was the migration of the v1 models to the conformer-based models which is due to take place in January. Although it's not going to break the v1 API, it's perhaps an indication that we would need to look at v2 sooner or later, anyway. There's some more details on that here: Migrate from classic to conformer models.

So far, the changes which have been most noticeable to me are the "single utterance" concept and the way multiple languages are dealt with. It seems that "single utterance" is no longer offered as a configuration parameter but is implicitly set if using the "latest_short" model. It's actually described here. RecognitionConfig no longer has an alternative_language_codes field. Instead it just has a language_codes field so I would be interested to see how effective it is at being able to automatically deduce which language is being spoken.

Anyway, you can read more about it in the reference documentation: google.cloud.speech.v2