CouncilDataProject / cdp-backend

Data storage utilities and processing pipelines used by CDP instances.
https://councildataproject.org/cdp-backend
Mozilla Public License 2.0
22 stars 26 forks source link

Google Speech-to-Text SR Model raises a confusing attr error instead of a defined error when Google runs into an issue #200

Closed evamaxfield closed 1 year ago

evamaxfield commented 2 years ago

Google Speech-to-Text SR model simply assumes that the call to Google Speech-to-Text will run and return correctly: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/sr_models/google_cloud_sr_model.py#L151

We do not check the result status at all. Or rather, it seems like Google will try to transcribe as much of the audio chunks as possible until one errors.

An example of one chunk erroring is here: https://github.com/OpenMontana/missoula-council-data-project/runs/7758558858?check_suite_focus=true#step:8:190

Where it raises an AttributeError.

The currently thinking about this is that Google will try to transcribe as many chunks as possible. BUT, if you reach a budget limit, it will stop, mid transcribe. Thus chunks before this point had proper transcriptions and chunks after this point (I assume) have error messages.

We should look for error messages before checking the transcript.