Add Support for Google Cloud Speech-To-Text v2 in mod_google_transcribe

This PR addresses #149 and offers support for the v2 version of the Speech-To-Text library whilst still supporting v1 simultaneously. The default behaviour is to use the v1 version of the library where everything works identically to the way it did in the previous version. In order to use v2 the FreeSWITCH variable GOOGLE_SPEECH_CLOUD_SERVICES_VERSION must be set to the value "v2". Setting it to "v1" or not setting it at all results in the default behaviour.

If the variable is used then it is essential to provide a so called recognizer parent path in the GOOGLE_SPEECH_RECOGNIZER_PARENT FreeSWITCH variable. Failure to do so will result in a failure to construct the GStreamer class. Recognizers allow commonly used streaming recognition parameters to be stored in the cloud. These stored values can be overridden with parameters passed at runtime but it is essential to provide a recognizer to v2 streaming recognition invocations. If you happen to have already created a recognizer in your Google Cloud account its id can be passed using the GOOGLE_SPEECH_RECOGNIZER_ID variable. If this is not set then mod_google_transcribe will just use the so called wildcard recognizer id ( the "_" character) and a recognizer will be created on the fly and not stored for future use. Note that even if a persistent recognizer is not required, it is always necessary to provide at least the parent id of the recognizer in GOOGLE_SPEECH_RECOGNIZER_PARENT, otherwise even the wildcard recognizer cannot be created. This parent id is a path string which consists of the google cloud project id which was used to create the google credentials file used, and a geographical location. For more details about recognizers, see https://cloud.google.com/speech-to-text/v2/docs/recognizers

As long as GOOGLE_SPEECH_CLOUD_SERVICES_VERSION is set to "v2" and GOOGLE_SPEECH_RECOGNIZER_PARENT is also set to a valid recognizer parent id then the "v2" library will be used and calls to uuid_google_transcribe should function as it did previously and any configuration parameters provided at runtime will override anything already defined in a predefined recognizer.

Differences between `v1` and `v2`

No single utterances in v2. That is to say that it is no longer required to specify this as a parameter. Instead it is taken to be implicit from the model selected. If single utterance behaviour is required then this is supported by the short model, for example. To see more details on models see https://cloud.google.com/speech-to-text/v2/docs/streaming-recognize.
Speaker diarization does not seem to be supported yet. The code to perform this is still there in mod_google_transcribe for v2 but I didn't manage to stuble across a combination of model, language and location which supports this. See https://stackoverflow.com/questions/76779418/speaker-diarization-is-disabled-even-for-supported-languages-in-google-speech-to
Multiple Language Support. If you provide up to a maximum of three languages to the recognition request, the speech engine will determine which of the three languages is most likely to have been spoken, automatically.

There are sure to be many more differences but these are the main things I found so far.

Some Notes on the Code and Building

To avoid code duplication we placed 'v1 specific code in google_glue_v1.cpp and the v2 specific stuff in google_glue_v2.cpp. Generic code used by both libraries now resides in generic_google_glue.h. We use our own docker image to build the drachtio modules but our make file is based on this one: https://github.com/drachtio/docker-drachtio-freeswitch-base/blob/main/files/Makefile.am.extra In order to compile and link the v2 stuff we had to add the following lines to the nodist_libfreeswitch_libgoogleapis_la_SOURCES assignment:

libs/googleapis/gens/google/api/policy.pb.cc \
libs/googleapis/gens/google/cloud/speech/v1/resource.pb.cc \
libs/googleapis/gens/google/cloud/speech/v1/resource.grpc.pb.cc \
libs/googleapis/gens/google/cloud/speech/v2/cloud_speech.pb.cc \
libs/googleapis/gens/google/cloud/speech/v2/cloud_speech.grpc.pb.cc \

If you don't do this, you'll most likely get some problems linking.

That's all I can think of for now. It would be really great if you also find this useful and we manage to get it merged. I am of course available for questions.

drachtio / drachtio-freeswitch-modules