chuot / rdio-scanner

Rdio Scanner is an open source software that ingest and distribute audio files generated by various software-defined radio recorders. Its interface tries to reproduce the user experience of a real police scanner, while adding its own touch.
GNU General Public License v3.0
422 stars 59 forks source link

Feature req: Google speech to text API #333

Closed canuckcam closed 1 year ago

canuckcam commented 1 year ago

Just thought it would be incredibly neat and helpful to be able to look at the text of recordings in Search Call mode. There's a lot of empty space right now in the UI on playback. Link to Google: https://cloud.google.com/speech-to-text. They have a phone_call model specifically for low-fi 8 KHz recordings.

The API does cost some money as a Google Cloud VM to use over 60mins per month of transcription. I absolutely do not expect perfect transcriptions especially with road names, etc. but should give a decent indication of the content of the transmission when scrolling back looking for specific calls?

My idea of each recording in the Search Call UI would be:

           Date | Time | Talkgroup Alias | Radio ID/Alias
Play/Stop
           Speech to text result
chuot commented 1 year ago

Well, It may be usefull for some, but I consider this as out of scope since Rdio Scanner is mainly for listening. Some other project may have that feature tho.

cpg178-kcd commented 1 year ago

I have found Google Speech to Text to be fairly inaccurate when decoding radio transmissions especially on analog. If you wanted to use it for one of the fancy Locution automated dispatch voices it would work great. But with human voices that are different each time, sometimes to close to mic, to far etc etc it does not work as well as one would hope.

Assembly AI API works much better than google, but still is not there yet.